The vacancy is well-defined but lacks compensation details, affecting overall attractiveness to applicants.
no salary info
Job description
At Exness, we are not just a leading trading broker—we’ve reimagined what it takes to be a leader. With 40M+ trades a day and 2,000+ people across 13 countries, we combine scale, care, and real tech to make trading better for 1M+ clients worldwide. Recognised globally as a Best Place to Work, we’re a people-first company where long-term wins always matter more. As part of our team, you will shape the future of fintech with real technology, care, and purpose.
Responsibilities
### What you'll actually do
- Close collaboration with infrastructure teams on selection and configuring GPU servers, high-performance networking, and RDMA-enabled clusters.
- Perform and manage GPU MIG configurations based on workload requirements and model characteristics.
- Ensure reliable and scalable GPU operations in Kubernetes, including runtime integration, device plugins, and GPU scheduling capabilities.
- Design, deploy, and maintain model serving runtimes, including vLLM, ONNX, SGLang, Nvidia Triton Runtimes, and KServe, ensuring high performance, scalability, and efficient GPU utilization.
- Build and maintain CI/CD pipelines and tooling for model packaging, versioning, and deployment, enabling reliable and model delivery for internal teams.
- Build and maintain platform tooling for model lifecycle management, including experiment tracking, model versioning, and registry systems (e.g. MLflow).
- Enable infrastructure and workflows for model fine-tuning and adaptation (e.g. LoRA), focusing on scalability, reproducibility, and automation within the platform.
- Develop and support internal tooling for managing model inputs and configurations (e.g. prompt templates), enabling consistent and reusable model usage patterns.
- Conduct performance testing and evaluation of multi-node GPU clusters to identify and resolve bottlenecks.
- Build and maintain observability for GPU clusters and model workloads, including metrics such as GPU utilization, memory usage, throughput, and latency.
- Integrate tracing for model inference workflows to provide end-to-end visibility into requests, and model behavior.
- Ensure compliance with security requirements for platform development.
- Evaluate and benchmark model inference performance across different runtimes, hardware setups, and configurations to guide platform optimization.
Requirements
### Who we’re looking for
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
- 5+ years of experience in infrastructure, platform engineering, or distributed systems.
- Hands-on experience working with GPU infrastructure, including NVIDIA or AMD stack and multi-GPU environments.
- Strong experience with Kubernetes, including deploying and operating production workloads.
- Experience with Linux-based environments.
- Strong programming skills in Python and/or Go.
- Understanding of distributed systems and multi-node workloads.
- Experience with model serving and inference systems (e.g. vLLM, ONNX, SGLang, Nvidia Triton Runtimes, KServe).
- Experience with CI/CD pipelines and automation for deploying services or models.
- Experience with monitoring and observability tools (metrics, tracing, logging).
- Nice to have familiarity with networking concepts relevant to distributed systems (e.g. RDMA, high-performance networking).
- Good communication and problem-solving skills.
- Ability to use advanced English for different work and business purposes.
- Critical thinking and attention to detail.
- Decision-making skills and the ability to adapt to new changes.
Conditions
### What we offer
- Full relocation support for you and your family to make your move smooth and worry-free.
About Exness
Exness is a global multi-asset retail broker founded in 2008 that provides online trading services to over 1 million clients worldwide. The company processes 40+ million trades daily and operates as an ethical trading platform across 13 countries.