AI Inference Engineer
PerplexitySan Francisco; Palo Alto; New York CityPosted 24 February 2026
Job Description
We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.ResponsibilitiesDevelop APIs for AI inference that will be used by both internal and external customersBenchmark and address bottlenecks throughout our inference stackImprove the reliability and observability of our systems and respond to system outagesExplore novel research and implement LLM inference optimizationsQualificationsExperience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Apply Now
Direct link to company career page
AI Resume Fit Check
See exactly which skills you match and which are missing before you apply. Free, instant, no spam.
Check my resume fitFree · No credit card
More jobs at Perplexity
See all →More Python jobs
See all →Staff Software Engineer — Search Platform, API & Infrastructure
Thomson Reuters · Remote
Technology Operations Analyst
ComplyAdvantage · Cluj-Napoca, Cluj, Romania
Staff Software Engineer — Search Platform, Ingestion & Indexing
Thomson Reuters · Remote
Senior Product Test Engineer
Locus Robotics · Wilmington, MA