Research Engineer (Focused on RL)

Firecrawl
San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)$180k – $270kPosted 18 March 2026

Tech Stack

Job Description

Research Engineer (Focused on RL) RESEARCH ENGINEER (FOCUSED ON RL) You'll bring reinforcement learning to Firecrawl's core product — building the training infrastructure, reward pipelines, and fine-tuning systems that make our models meaningfully better at extracting, understanding, and structuring web data. This isn't theoretical RL research. You'll build your own training infra, run fast experiments, ship models to production, and bridge the gap between classical RL approaches and modern LLM agent systems. If you care as much about training throughput as you do about reward design, this is the role. Salary Range: $180,000–$270,000/year (Range shown is for U.S.-based employees. Compensation outside the U.S. is adjusted fairly based on your country's cost of living. You can explore how we calculate this here: https://www.firecrawl.dev/careers/compensation.) Equity Range: Up to 0.15% Location: San Francisco, CA or Remote (Americas, UTC-3 to UTC-10) Job Type: Full-Time Experience: 3+ years in applied RL, ML engineering, or model training — with production systems Visa: US Citizenship/Visa required for SF; N/A for Remote ABOUT FIRECRAWL Firecrawl is the easiest way to extract data from the web. Developers use us to reliably convert URLs into LLM-ready markdown or structured data with a single API call. In just a year, we've hit 8 figures in ARR and 90k+ GitHub stars by building the fastest way for developers to get LLM-ready data. We're a small, fast-moving, technical team building essential infrastructure super-intelligence will use to gather data on the web. We ship fast and deep. WHAT YOU'LL DO - Build training infrastructure and reward pipelines from scratch: Design and operate the systems that train and evaluate Firecrawl's models. You'll own the full loop — data collection, reward modeling, training runs, evaluation, and deployment. You build the infra yourself because you're the one who needs it to work. - Fine-tune models to achieve state-of-the-art results: Take foundation models and make them dramatically better at web data extraction, content understanding, and structured output generation. You know how to get from "decent fine-tune" to "best-in-class" and you have the patience and rigor to close that gap. - Bridge LLM agents and classical RL: The most interesting problems at Firecrawl sit at the intersection of modern LLM-based agents and classical RL techniques. You'll design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows, and figure out where traditional RL approaches outperform prompting — and vice versa. - Run fast experiments and iterate: You design experiments that test meaningful hypotheses, run them quickly, and make decisions based on results. You don't spend weeks on experiment infrastructure before getting a single result. Speed of iteration is a core part of how you work. - Communicate clearly to non-RL people: RL can be opaque. You translate your work into language that engineers, product people, and leadership can understand and act on. You know how to explain why a reward function matters without requiring everyone to read the paper. - Collaborate across the research team: Work closely with the Head of Research and the Search/IR-focused Research Engineer to connect RL improvements with search, ranking, and the broader product strategy. WHAT WE'RE LOOKING FOR Someone who builds their own training infra and reward pipelines. You don't wait for an ML platform team to set things up. You build the training loops, reward models, data pipelines, and evaluation frameworks yourself — because you understand that the infra choices directly affect the quality of the results. You've operated GPU clusters, managed training runs, and debugged convergence issues in production. Can fine-tune models to achieve SOTA results. You've taken models from baseline to best-in-class on tasks that matter. You understand the full fine-tuning lifecycle — data curation ... (truncated, view full listing at source)
Apply Now

Direct link to company career page

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card

Share