AI Inference Engineer

San Francisco; Palo Alto; New York CityPosted 24 February 2026

Tech Stack

Python Rust C++Kubernetes TensorFlow PyTorch Machine Learning AI LLM

Job Description

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.ResponsibilitiesDevelop APIs for AI inference that will be used by both internal and external customersBenchmark and address bottlenecks throughout our inference stackImprove the reliability and observability of our systems and respond to system outagesExplore novel research and implement LLM inference optimizationsQualificationsExperience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Apply Now

Direct link to company career page

More jobs atPerplexity

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card