AI Inference Engineer
PerplexitySan Francisco; Palo Alto; New York CityPosted 24 February 2026
Job Description
We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.ResponsibilitiesDevelop APIs for AI inference that will be used by both internal and external customersBenchmark and address bottlenecks throughout our inference stackImprove the reliability and observability of our systems and respond to system outagesExplore novel research and implement LLM inference optimizationsQualificationsExperience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Apply Now
Direct link to company career page
More jobs at Perplexity
See all →Android Engineer - Comet
San Francisco; Palo Alto; New York City · 24 February 2026
iOS Growth Engineer
San Francisco; New York City · 24 February 2026
Senior Counsel, Intellectual Property (Patent)
San Francisco · 24 February 2026
Senior/Staff Software Engineer - Data Platform
Platform & Infrastructure · 24 February 2026
More Python jobs
See all →[Summer 2026] People Science - PhD Intern
Roblox · San Mateo, CA, United States
Team Lead - Security Platform
Cloudflare · Distributed; Hybrid
Sr. Security Software Engineer, Applied Computing (Starshield)
SpaceX · Hawthorne, CA
Security Software Engineer, Applied Computing (Starshield)
SpaceX · Washington, DC