Senior Computer Scientist

Adobe
NoidaPosted 25 April 2026

Job Description

Senior Computer Scientist Team: AI Platform Engineering Role Overview We are looking for a Senior Infrastructure Developer with 10 years of experience to own, evolve, and scale the platform that powers our most demanding ML training workloads. This is not a keep the lights on role — you will be architecting systems, writing production-grade code, leading multi-quarter projects across geo-distributed teams, and setting the reliability bar for an infrastructure that thousands of GPU hours depend on every day. You bring deep Kubernetes expertise, strong networking fundamentals, a developer's mindset, and the leadership instincts to navigate ambiguity and drive alignment across cross-functional stakeholders. You have operated systems at massive scale and felt the weight of that responsibility. About the Platform You will be working on a cutting-edge platform designed to train and serve large-scale machine learning models. The platform supports everything from small-scale experimentation to massive, distributed training jobs running on GPU clusters spanning thousands of accelerators. It provides ML engineers and researchers with the tools to onboard, monitor, and scale their workloads — whether a lightweight prototype or a production-grade deep learning model powering real-world applications. Key platform capabilities: Dynamic GPU orchestration using Kubernetes with custom schedulers and resource topology awareness. Training & inference workflows end-to-end pipeline support from data ingestion through model serving. Observability & cost tracking full-stack visibility across compute, network, and storage layers. Self-service developer tooling enabling high-velocity experimentation without platform bottlenecks. Multi-cloud infrastructure primarily AWS with Azure/GCP expansion underway. Your contributions will directly determine the reliability, scalability, and efficiency of this platform — and the speed at which AI teams can innovate. What You’ll Do Architect for scale Design and evolve Kubernetes-native infrastructure capable of running distributed GPU training jobs at massive scale, with an obsession for reliability and efficiency. Lead cross-geo initiatives Own complex, multi-team projects end-to-end — write design docs, align stakeholders across time zones, and drive delivery in ambiguous, fast-moving environments. Codify infrastructure Define and ship cloud infrastructure through IaC (Terraform/Pulumi). Treat infra changes with the same rigor, testing, and review as application code. Build observability Design and maintain deep observability stacks — metrics, distributed tracing, log aggregation, SLO/SLI frameworks — that surface problems before they become incidents. Write production code Build automation, internal tooling, operators, and platform services in Go, Python, or Rust. This is not a YAML-only role. Own reliability Lead incident response, post-mortems, and reliability reviews. Drive systemic fixes, not just workarounds. Set the on-call culture. Solve hard networking problems Debug and resolve complex cluster networking issues — CNI, BGP, service mesh, DNS at scale, east-west traffic, high-throughput tuning. Mentor and grow the team Raise the technical bar through code reviews, architectural guidance, and knowledge sharing with engineers across experience levels. What You Bring Core Requirements: Kubernetes & GPU Infrastructure 10 years in SRE, platform engineering, or infrastructure roles Expert-level Kubernetes internals: scheduler, kubelet, CRDs, operators, admission controllers Proven experience running GPU/accelerator training workloads at scale Multi-cluster management, federation, and workload placement strategies Helm, Kustomize, GitOps (Flux/ArgoCD) — and knowing when not to use them. Cloud & Infrastructure as Code Deep AWS hands-on experience required (VPC, EKS, EC2, S3, IAM, TGW) Terraform or Pulumi — production-grade, modular, tested CI/CD for infrastructure: drift detection, plan gati ... (truncated, view full listing at source)
Apply Now

Direct link to company career page

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card

Share