AI Inference Engineer

Full timePosted 24 February 2026

Tech Stack

Python Rust C++Kubernetes TensorFlow PyTorch Machine Learning AI LLM Compensation

Job Description

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.ResponsibilitiesDevelop APIs for AI inference that will be used by both internal and external customersBenchmark and address bottlenecks throughout our inference stackImprove the reliability and observability of our systems and respond to system outagesExplore novel research and implement LLM inference optimizationsQualificationsExperience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)Understanding of GPU architectures or experience with GPU kernel programming using CUDAFinal offer amounts are determined by multiple factors, including, experience and expertise.Equity: In addition to the base salary, equity may be part of the total compensation package.

Apply Now

Direct link to company career page

More jobs at Perplexity

Share this job

LinkedIn X