Member of Technical Staff - Multi-Modal, Vision

Research & EngineeringPosted 24 February 2026

Tech Stack

Python Scala AI Computer Vision Hugging Face

Job Description

About Liquid AISpun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there.The OpportunityThe VLM team builds vision-language models that run on-device, under tight latency and memory constraints, without sacrificing quality. We have released four best-in-class models and we're just getting started.This team owns the full VLM pipeline end-to-end: from researching new architectures and training algorithms through data curation, evaluation, and deployment. You'll join a focused, hands-on group that works directly on models and collaborates closely with our pretraining, post-training, and infrastructure teams. Success here is measured by the capability of the models we ship.Minimal qualifications:Hands-on experience in training or evaluating VLMs with demonstrated experimental rigor.Ability to turn research ideas into scalable implementations, refine and iterate through hypotheses.Proficiency in Python and at least one deep learning framework.M.S. or Ph.D. in Computer Science, Mathematics, or a related field; or equivalent industry experience.This role is for you if you have experience in some of the following:Building or optimizing multimodal training or data pipelines.Experience with distributed training (DeepSpeed, FSDP, Megatron-LM, etc.).Multimodal post-training experience (SFT, preference optimization, RL-style methods).Dataset design and data quality expertise (quality and diversity assessment, long-tail mining).Prior open-source contributions (code, data, models) on GitHub or Hugging Face.Published research at top AI conferences (NeurIPS, ICML, CVPR, ECCV, ICLR, ACL, etc.).Experience with computer vision or visual representation learning.What working here might look like:Lead a new model capability end-to-end from task spec through data curation, training recipe, ablations, evaluation, and into the final shipped model.Improve visual reasoning through reinforcement learning and preference optimization methods.Push the quality-efficiency frontier on token efficiency via encoder/connector design. Exemplary outcome: a connector that cuts vision tokens without quality loss.What Success Looks Like (Year One):The VLM models we ship are state-of-the-art.You own a major work-stream (for instance, video understanding, preference data quality, or encoder architecture) end-to-end.At least one model has shipped to production with your direct contribution.What We Offer:Full ownership: You own your work from architecture to deployment.Compensation: Competitive base salary with equity in a unicorn-stage companyHealth: We pay 100% of medical, dental, and vision premiums for employees and dependentsFinancial: 401(k) matching up to 4% of base payTime Off: Unlimited PTO plus company-wide Refill Days throughout the year

Apply Now

Direct link to company career page

More jobs atLiquid AI

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card