The vacancy is well-structured and informative, providing clarity on tasks, compensation, and requirements.
Job description
CloudLinux is a global remote-first company. We are driven by our principles: do the right thing, employees first, remote first, and we deliver high-volume, low-cost Linux infrastructure and security products that help companies to increase the efficiency of their operations. Every person on our team supports each other and does what we can to ensure we all are successful.
Responsibilities
### What You Will Do
- **Kubernetes Platform Engineering (Primary Focus — 40%)**
- Design, build, and operate a multi-tenant Kubernetes platform using Cluster API (CAPI) with bare-metal providers (Metal3/Sidero).
- Implement hard multi-tenancy using vCluster (Loft Labs) or similar technology, providing isolated Kubernetes API servers per tenant.
- Deploy and manage KubeVirt for VM orchestration within Kubernetes, including CPU pinning, NUMA awareness, and HugePages configuration.
- Implement GitOps-driven infrastructure using ArgoCD or Flux as the single source of truth for all cluster configurations.
- Deploy and manage Policy-as-Code using Kyverno or OPA Gatekeeper for admission control, resource quotas, and security policies.
- Build self-service capabilities using Crossplane or similar Kubernetes-native infrastructure provisioning tools.
- **Storage Engineering (20%)**
- Operate and optimize Ceph distributed storage clusters (currently 1 PiB raw, 149 OSDs, Quincy 17.2.5).
- Manage Rook-Ceph operator deployments at scale on modern Kubernetes (v1.28+).
- Implement storage tiering: Ceph for bulk storage, local NVMe for high-IOPS workloads, LINSTOR/DRBD or TopoLVM for ultra-fast replicated storage.
- Design and implement per-VM / per-tenant I/O isolation on shared Ceph clusters.
- Manage CDI (Containerized Data Importer) for VM image lifecycle in KubeVirt environments.
- **Networking (15%)**
- Deploy and manage overlay networks for pod networking, micro-segmentation, and WireGuard/IPsec encryption.
- Implement Cluster Mesh for multi-datacenter pod-to-pod connectivity.
- Configure Multus CNI and SR-IOV for multi-NIC VM support in KubeVirt.
- Work with physical network infrastructure: Juniper switches (JunOS), BGP (eBGP/iBGP), EVPN/VXLAN, VLANs.
- Maintain IPSec site-to-site connectivity between datacenters.
- **Reliability and Operations (15%)**
- Practice SRE discipline: define and maintain SLOs with error budgets, implement proactive capacity management with 6-12 month forecasting.
- Design and execute chaos engineering experiments to validate system resilience.
- Participate in on-call rotation for IaaS infrastructure (OpenNebula, Ceph, networking).
- Write and maintain runbooks, DRP documentation, and postmortem analyses.
- Drive proactive improvement: identify reliability risks, performance bottlenecks, and toil — then propose and implement solutions without waiting for incidents.
- **Infrastructure as Code and Automation (10%)**
- Develop and maintain Terraform/OpenTofu modules for multi-cloud infrastructure provisioning.
- Write Ansible playbooks for bare-metal server configuration and fleet management.
- Automate infrastructure lifecycle: PXE
Requirements
### Requirements
- Proven experience in Kubernetes platform engineering and IaaS.
- Strong understanding of cloud infrastructure, networking, and storage solutions.
- Experience with GitOps practices and tools (ArgoCD, Flux).
- Familiarity with Ceph and distributed storage management.
- Proficiency in Terraform and Ansible for automation and infrastructure as code.
- Ability to work independently and collaboratively in a remote team environment.
Conditions
### What We Offer
- Competitive salary ranging from $115,000 to $195,500 USD.
- Fully remote work environment.
- Supportive team culture focused on collaboration and success.
- Opportunities for professional growth and development.
About CloudLinux
CloudLinux develops a hardened Linux distribution and security solutions optimized for web hosting environments. The company provides CloudLinux OS, kernel live security patching, and web server security software used by enterprises, service providers, governments, and universities to enhance performance, security, and stability of Linux servers.
SaaS· 200-1000· Palo Alto, California, United States· Founded 2009· https://cloudlinux.com