Applied Research Intern

Labelbox
San Francisco Bay AreaPosted 3 March 2026

Job Description

<div class="content-intro"><h2><strong>Shape the Future of AI</strong></h2> <p>At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric approaches that are fundamental to AI development, and our work becomes even more essential as AI capabilities expand exponentially.</p> <h2><strong>About Labelbox</strong></h2> <p>We're the only company offering three integrated solutions for frontier AI development:</p> <ol> <li><strong>Enterprise Platform Tools</strong>: Advanced annotation tools, workflow automation, and quality control systems that enable teams to produce high-quality training data at scale</li> <li><strong>Frontier Data Labeling Service</strong>: Specialized data labeling through Alignerr, leveraging subject matter experts for next-generation AI models</li> <li><strong>Expert Marketplace</strong>: Connecting AI teams with highly skilled annotators and domain experts for flexible scaling</li> </ol> <h2><strong>Why Join Us</strong></h2> <ul> <li><strong>High-Impact Environment</strong>: We operate like an early-stage startup, focusing on impact over process. You'll take on expanded responsibilities quickly, with career growth directly tied to your contributions.</li> <li><strong>Technical Excellence</strong>: Work at the cutting edge of AI development, collaborating with industry leaders and shaping the future of artificial intelligence.</li> <li><strong>Innovation at Speed</strong>: We celebrate those who take ownership, move fast, and deliver impact. Our environment rewards high agency and rapid execution.</li> <li><strong>Continuous Growth</strong>: Every role requires continuous learning and evolution. You'll be surrounded by curious minds solving complex problems at the frontier of AI.</li> <li><strong>Clear Ownership</strong>: You'll know exactly what you're responsible for and have the autonomy to execute. We empower people to drive results through clear ownership and metrics.</li> </ul></div><h2><strong>Role Overview</strong></h2> <p>As an Applied Research intern at Labelbox, you will design, build, and productionize evaluation and post‑training systems for frontier LLMs and multimodal models. You’ll own continuous, high-quality evals and benchmarks (reasoning, code, agent/tool‑use, long‑context, vision‑language, et al.), create and curate post‑training datasets (human + synthetic), and prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to measure and improve real‑world task and agent performance.</p> <h2><strong>Your Impact</strong></h2> <ul> <li>Build and own evaluation and benchmark suites for reasoning, code, agents, long‑context, and V/LLMs.</li> <li>Create post‑training datasets at scale: design preference/critique pipelines (human + synthetic), and target hard failures surfaced by evals.</li> <li>Experiment and prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to improve real-world task and agent performance.</li> <li>Land research in product: ship improvements into Labelbox workflows, services, and customer‑facing evaluation/quality features; quantify impact with customer and internal metrics.</li> <li>Engage with customer research teams: run pilots, co‑design benchmarks, and share practical findings through internal research reports, blog posts, talks, and published papers.</li> </ul> <h2><strong>What You Bring</strong></h2> <ul> <li>A strong foundation in AI and machine learning, backed by a Ph.D. or Master’s degree in Computer Science, Machine Learning, AI, or a related field (in progress degrees are acceptable for intern positions).</li> <li>A deep understanding of frontier autoregressive and diffusion multimodal models, along with the human and synthetic data strategies needed to optimize them.</li> <li>Passion and experience for LLM evaluation and benchmarking.</li> <li>Expertise in training data quality construction, measurement and refinement.</li> <li>The ability to brid ... (truncated, view full listing at source)