About the Job
Join Impala AI, an innovative startup building a fully-managed LLM-inference platform, that enables data heavy enterprises to perform any AI task at any scale without limits.
We’re looking for a Founding DevOps Engineer to build our infrastructure DNA and the backbone of the AI revolution. You’ll be responsible for building and optimizing scalable cloud infrastructure solutions tailored for AI workloads. This role offers a unique opportunity to directly shape our infrastructure strategy, improve system reliability and performance, and contribute to establishing Impala AI as a leader in adaptive AI compute management.
Join us to tackle the magic that make AI tick under the hood and build the backbone powering the AI revolution.
What You’ll Do
- Architecture: Architect a modular, multi-cloud "Control Plane vs. Data Plane" model using Infrastructure as Code.
- Infrastructure R&D: Research and implement cutting-edge solutions for GPU orchestration, low-latency networking, and automated cross-account (BYOC) provisioning.
- The Kubernetes Frontier: Deep-dive into K8s internals to optimize LLM inference workloads. We’ll design how we use Helm, GitOps processes, and specialized operators to manage state-of-the-art AI clusters.
- Performance Engineering: Don't just monitor; optimize. You will research bottlenecks in the stack - from networking connectivity limits to GPU memory throughput - and build the infrastructure to solve them.
- Build the "Golden Path": Develop the internal tooling and automation that allows our AI researchers to move at lightning speed without worrying about the underlying hardware.
What You’ll Bring
We don't expect you to know everything, but we expect you to be a master of the fundamentals and a fast learner of the "new."
- Multi-Cloud Mastery: Deep experience with AWS and GCP at an architectural level.
- Modern IaC: Expertise in Terraform/Terragrunt/CrossPlane (you believe in DRY, reusable, and versioned infrastructure).
- GitOps Evangelist: Proponent of ArgoCD/FluxCD and Helm for declarative, self-healing deployments.
- The R&D Mindset: You enjoy the "Discovery" phase - researching different solutions to a problem before writing the first line of code.
- Programming Literacy: Proficiency in Python or Golang for building internal tools and automation.
Bonus Points
- 2-4 years of hands-on experience in DevOps, Platform, or Infrastructure Engineering. (We value trajectory and depth over decades of tenure).
- Kubernetes Expertise: You are comfortable with K8s orchestration, including Helm charts and ArgoCD for declarative deployments.