Senior Research Engineer, Post-training & Evaluation

Remote - United StatesPosted 20 February 2026

Tech Stack

Python Scala CI/CD PyTorch Machine Learning AI LLM Hugging Face MLflow Fine-tuning

Job Description

Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 121 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com . Reddit is continuing to grow our teams with the best talent. This role is completely remote friendly within the United States. If you happen to live close to one of our physical office locations (San Francisco, Los Angeles, New York City Chicago) our doors are open for you to come into the office as often as you'd like. The AI Engineering team at Reddit is embarking on a strategic initiative to build our own Reddit-native foundational Large Language Models (LLMs). This team sits at the intersection of applied research and massive-scale infrastructure, tasked with training models that truly understand the unique culture, language, and structure of Reddit communities. You will be joining a team of distinguished engineers and safety experts to build the "engine room" of Reddit's AI future—creating the foundational models that will power Safety Moderation, Search, Ads, and the next generation of user products. As a Senior Research Engineer for Post-Training Evaluation, you will own the critical "feedback loop" of our model development. While the pre-training team builds the base models, you will architect the evaluation suites and fine-tuning pipelines that determine if those models are actually safe, smart, and "Reddit-native." You will build the "Reddit Benchmark"—our internal standard for model quality—and execute the Supervised Fine-Tuning (SFT) workflows that adapt our models for Safety and Moderation tasks. Responsibilities: Architect and maintain the "Reddit Benchmark" evaluation suite: A comprehensive harness that rigorously tests model capabilities across Safety, Reasoning, and Reddit-specific knowledge (slang, norms). Build scalable SFT (Supervised Fine-Tuning) pipelines: Implement efficient, distributed training loops for instruction tuning, converting raw base models into helpful assistants. Develop Model-as-a-Judge systems: Engineer automated evaluation pipelines using strong models (e.g., GPT-5, Nova, Claude) to grade the outputs of our internal models, enabling rapid iteration cycles. Execute Synthetic Data generation strategies: Create and curate high-quality instruction sets to improve model generalization where human data is scarce. Collaborate with Safety Engineering: Translate high-level safety policies into concrete evaluation metrics and unit tests that run in our CI/CD pipelines. Debug post-training instability: Dive deep into loss curves and evaluation logs to identify when fine-tuning is causing alignment tax or capability degradation. Required Qualifications: 4+ years of professional experience in machine learning engineering, with a focus on LLM fine-tuning or evaluation. Fluency in Python and PyTorch, with experience using libraries like Hugging Face Transformers, vLLM, or lm-eval-harness. Deep understanding of Instruction Tuning (SFT) and how data quality impacts model behavior. Experience building Evaluation Pipelines: You know the difference between MMLU, GSM8K, and how to build a custom domain-specific benchmark. Familiarity with distributed training (FSDP/DeepSpeed) for fine-tuning jobs. Strong data engineering skills for curating and cleaning instruction datasets. Nice to Have: Experience with MLFlow, Weights Biases, or other experiment tracking tools. Experience with Synthetic Data generation (e.g., Self-Instruct papers) Benefits: Comprehensive Healthcare Benefits and Income Replacement Programs 401k with Employer Match Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving ... (truncated, view full listing at source)

Apply Now

Direct link to company career page

More jobs atReddit

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card