All vacancies
Kronos Research

Senior SRE Engineer

Kronos Research · office · senior · full-time
tech LinuxBashAnsiblePythonHPCAWSAlibaba CloudGCPTerraformDockerKubernetesGitLab
6.5
AI Score
The vacancy is well-defined but lacks compensation details, affecting overall attractiveness to applicants.
no salary info
Job description
Kronos Research is seeking a Senior SRE Engineer to manage large-scale Linux environments, operate HPC clusters, and manage multi-cloud infrastructures. The role requires strong autonomy and experience in Linux systems administration.
Responsibilities
## Responsibilities ### Linux Systems & Automation (Core) - Manage large-scale Linux environments: troubleshooting and root-cause analysis - Write maintainable, hand-off-ready Bash / Ansible / Python automation - On-call for infrastructure, CI/CD, and production service incidents ### HPC Cluster & Storage - Operate HPC clusters (Slurm) along with usage analytics, auditing, and monitoring tools - Maintain and plan storage for compute environments (Lustre, NAS) ### Cloud & Hybrid Infrastructure - Manage multi-cloud environments (AWS, Alibaba Cloud, GCP) with Terraform / AWS CDK - Build and operate Docker (ECS) / Kubernetes (EKS) environments and their deployment workflows ### CI/CD & Developer Experience - Operate self-hosted GitLab server and Runner fleet - Operate CI/CD systems and design deployment pipelines for research and other projects ### GenAI / Internal Platform - Build internal AI platforms (LangChain / LangGraph / Bedrock, Elasticsearch RAG) - Develop MCP servers, chatbots, AI agents, and similar services
Requirements
## Requirements - **5+ years** of hands-on Linux systems administration and infrastructure operations experience - Solid Linux internals knowledge (process / memory / filesystem / networking / systemd / cgroup); able to localize issues even without complete logs - Strong Bash / Shell scripting skills — able to write maintainable scripts that others can pick up - Programming ability for data processing, CLI tools, and API services; Python proficiency preferred - Solid storage fundamentals with hands-on experience: RAID levels and rebuild trade-offs, filesystem selection, snapshot and backup planning; NAS / shared storage (NFS / SMB) operations experience - Experience with at least one major public cloud (AWS / GCP / Alibaba Cloud) and IaC tooling (Terraform / CDK / Ansible) - Familiar with containerization and orchestration (Docker, Kubernetes) - CI/CD pipeline design and operations experience (GitLab CI / Jenkins / Airflow) - Able to own a cross-service subsystem end-to-end: design, implementation, documentation, handoff - **Strong autonomy**: can drive a problem from discovery, root-cause investigation, decision-making, to delivery with minimal supervision; able to make judgment calls under incomplete information and proactively communicate progress, risks, and rationale - **Self-directed**: doesn't wait for tickets — identifies problems worth solving and prioritizes them independently
About Kronos Research
Kronos Research is a quantitative trading firm focused on digital asset markets. It provides high-frequency trading, market making, liquidity solutions, and related investment/asset management services using proprietary research and machine intelligence.
Crypto · 200-1000 · Taipei City, Taiwan · Founded 2018 · https://kronosresearch.com/?ref=sailonchain.com
Apply to this role