The vacancy is strong in task clarity and technical requirements but lacks compensation details.
no salary info
Job description
Kraken is a mission-focused company rooted in crypto values, aiming to accelerate the global adoption of crypto for financial freedom and inclusion. As a fully remote company with Krakenites in over 70 countries, Kraken pioneers premium crypto products like Kraken Pro, Desktop, Wallet, and Kraken Futures with industry-leading security and world-class client support. The AI Infrastructure team builds and operates production systems that power intelligent agents at scale. This team is foundational to the agent platform, ensuring reliable, observable, and performant model inference, orchestration, and execution layers under real-world load. They work closely with the Agent Systems team and broader infrastructure partners, owning core primitives for safe agent operation across internal systems. The environment is high-scale and high-stakes, serving millions of users and meeting strict reliability, latency, and correctness standards. This is a deeply production-oriented team, combining strong systems thinking with applied ML infrastructure experience, building in Rust and operating performance-critical services. The opportunity involves designing and building the infrastructure layer for AI agent systems in production, developing high-performance Rust services for model inference, orchestration, and execution, and architecting scalable systems for millions of users and high throughput.
Responsibilities
- Build reliable ML infrastructure and MLOps patterns for deployment, evaluation, and monitoring.
- Define guardrails, observability, and failure handling for agent workflows.
- Optimize latency, throughput, and cost.
- Partner with the Agent Systems team to harden prototypes into production systems.
- Contribute to foundational infrastructure decisions in a high-impact environment.
Requirements
- 5+ years of experience building and operating high-scale production systems.
- Strong proficiency in Rust and systems-level programming.
- Deep understanding of distributed systems, reliability engineering, and performance optimization.
- Experience operating services serving millions of users or high-throughput workloads.
- Familiarity with ML infrastructure, model serving, or MLOps in production environments.
- Experience designing observability, monitoring, and failure recovery systems.
- Strong collaboration skills working across infrastructure and applied engineering teams.
- High ownership mindset in high-stakes production environment.
- Experience building infrastructure for agent-based or LLM-powered systems.
- Background in high-performance networking, async systems, or low-latency architectures.
- Experience with container orchestration and cloud-native infrastructure.
- Familiarity with evaluation frameworks and model performance monitoring at scale.
- Experience working in fast-moving 0→1 or platform-building teams.
About Kraken
Kraken (legally Payward, Inc.) is a US-based cryptocurrency exchange that facilitates trading of cryptocurrencies, stocks, futures, and ETFs in most US states. It serves over 10 million clients worldwide with $207 billion in quarterly trading volume and has expanded to tokenized equities for non-US customers.
Crypto· 1000+· San Francisco, United States· Founded 2011· https://kraken.com