Member of Technical Staff - Edge Inference Engineer

Liquid AI
Research & EngineeringPosted 24 February 2026

Job Description

About Liquid AISpun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there.The OpportunityOur Edge Inference team compiles Liquid Foundation Models into optimized machine code that runs on resource-constrained devices: phones, laptops, Raspberry Pis, and watches. We are core contributors to llama.cpp and build the infrastructure that makes efficient on-device AI possible. You will work directly with the technical lead on problems that require deep understanding of both ML architectures and hardware constraints. This is high-ownership work where your code ships to production and directly impacts model performance on real devices.While San Francisco and Boston are preferred, we are open to other locations.What We're Looking ForWe need someone who:Works autonomously: Given a target device and performance goal, you figure out how to get there without hand-holding. You diagnose bottlenecks, prototype solutions, and iterate until you hit the target.Thinks at the hardware level: You understand cache hierarchies, memory access patterns, and instruction-level optimization. You can reason about why code is slow before reaching for a profiler.Bridges ML and systems: You understand how neural networks work mathematically (matrix operations, attention mechanisms, quantization effects) and can translate that understanding into optimized implementations.Ships production code: Our work goes upstream to open-source projects and deploys to customer devices. You write code that others can maintain and extend.The WorkImplement and optimize inference kernels for CPU, NPU, and GPU architectures across diverse edge hardwareDevelop quantization strategies (INT4, INT8, FP8) that maximize compression while preserving model quality under strict memory budgetsContribute to llama.cpp and other open-source inference frameworks, including new model architectures (audio, vision)Profile and optimize end-to-end inference pipelines to achieve sub-100ms time-to-first-token on target devicesCollaborate with ML researchers to understand model architectures and identify optimization opportunities specific to Liquid Foundation ModelsDesired ExperienceMust-have:5+ years of experience in systems programming with strong C++ proficiencyEmbedded software engineering experience or work on resource-constrained systemsUnderstanding of ML fundamentals at the linear algebra level (how matrix operations, attention, and quantization work)Experience with hardware architecture concepts: cache hierarchies, memory bandwidth, SIMD/vectorizationNice-to-have:Contributions to llama.cpp, ExecuTorch, or similar inference frameworksExperience with Rust for systems programmingBackground in custom accelerator development (TPU, NPU) or work at companies like SambaNova, Cerebras, Groq, or Google/Amazon accelerator teamsQuantitative degree (mathematics, physics, or similar) combined with engineering experienceWhat Success Looks Like (Year One)Ship optimizations that achieve measurable latency or memory improvements on at least one target edge device classSuccessfully upstream at least one significant contribution to llama.cpp (new architecture support, kernel optimization, or quantization improvement)Own a major workstream end-to-end, such as new model architecture support, quantization pipeline for a device constraint, or target platform enablementWhat We OfferRare technical challenges: Work on novel model architectures that require custom optimization strategies. Your code ships to production and runs on real devices.Compensation: Competitive base salary with equity in a unicorn-stage companyHealth: We pay 100% of medical, dental, and vision premiu ... (truncated, view full listing at source)