Research Engineer (Scaling Multimodal Data)

WorldLabs
San FranciscoPosted 5 March 2026

Job Description

<h2><strong>About World Labs:</strong></h2> <p>We build foundational world models that can perceive, generate, reason, and interact with the 3D world — unlocking AI's full potential through spatial intelligence by transforming seeing into doing, perceiving into reasoning, and imagining into creating. We believe spatial intelligence will unlock new forms of storytelling, creativity, design, simulation, and immersive experiences across both virtual and physical worlds. We bring together a world-class team, united by a shared curiosity, passion, and deep backgrounds in technology — from AI research to systems engineering to product design — creating a tight feedback loop between our cutting-edge research and products that empower our users.</p> <h2><strong>About the Role:</strong></h2> <p>We’re looking for a research engineer to help improve our in-house world models through better multimodal data. This role is about figuring out what data actually moves model quality — then building the datasets, pipelines, and experiments to prove it. The best generative models aren’t just a product of model architecture and compute, they are a product of the training data. The model output reflects someone’s obsession over what goes into the data, how it’s processed, and what gets thrown away. We’re looking for the person who does the obsessing and builds the tools to act on it at scale. This isn’t a role where someone hands you a dataset and asks you to clean it. You will decide what data we need, figure out where to get it, build the processing and curation systems, and close the loop with model training to make sure it actually works. You will need strong engineering skills to do this well, but engineering serves your judgement about data, not the other way around.<br><br></p> <h2><strong>What You’ll Do:</strong></h2> <ul> <li><strong>Discover, evaluate, and acquire training data</strong>. You will find, evaluate, and integrate data from diverse sources. You will write scrapers, work with APIs, and make judgement calls about whether a source is worth pursuing before investing days of effort.</li> <li><strong>Build data processing and curation systems</strong>. Design and implement data processing pipelines for filtering, deduplication, quality scoring, and curation. You will create well-abstracted systems that your teammates can pick up and extend.</li> <li><strong>Look at the actual data constantly</strong>. You will sampling outputs, spotting distributional issues (e.g., too many screenshots, low-resolution crops, near-duplicates), and catch problems before they propagate to model training.</li> <li><strong>Close the data → model → evaluation loop.</strong> You will diagnose model failures and trace them back to data issues, then design principled fixes to nip the problem in the bud.</li> <li><strong>Deploy ML models for data enrichment</strong>. captioning, quality scoring, text embedding, segmentation, classification etc. You will evaluate whether these models actually help.</li> <li><strong>Make systematic, documented decisions</strong>. Score thresholds, filtering criteria, mixture ratios — every processing choice should be reproducible, versioned, and auditable. You will set the standard for rigor on the team.</li> </ul> <h2><strong>Questions We Think About:</strong></h2> <ul> <li>How do you sample data for large scale world models, where the best practices for dense frame video models don’t apply?</li> <li>How do you caption large scale video datasets for world generation?</li> <li>How do you measure the diversity of video datasets, where counting the raw number of hours or frames doesn’t account for variation in content?</li> <li>How do we build data pipelines that are reproducible and robust?</li> <li>How can we improve the observability of billion-scale datasets so we can catch issues early?</li> <li>What does it mean for a dataset to have a good “taste”? How do you operationalize aesthetic judgement at a billion scale?</li> <li> ... (truncated, view full listing at source)