Senior Scientific Data Engineer, Data Platform

London, England; Oxford, England£76k – £102kPosted 7 April 2026

Tech Stack

React Python Snowflake BigQuery Machine Learning AI LLM Agents

Job Description

Your work will change lives. Including your own. Recursion is decoding biology to industrialize drug discovery. We are looking for a Senior Scientific Data Engineer. As part of a team, you will own a suite of business-critical data products, including our Structure-Activity Relationship data mart. This is a high-impact role requiring a strong synthesis of robust software engineering capabilities and deep drug discovery domain expertise. You will take ownership of the data architecture responsible for ingesting, standardizing, and serving both public and proprietary datasets. These systems directly power our competitor intelligence, chemical tractability assessments, and compound design models. Please note: This is a specialized Data Engineering position focused strictly on data infrastructure and product ownership. While your work will directly enable our machine learning and predictive modeling efforts, the responsibilities do not encompass building or training models. This opportunity is ideally suited for engineers dedicated to architecting complex scientific data systems, rather than data scientists seeking modeling-focused roles. The Systems You Will Own You will join the Data Platform team and maintain an ecosystem of ~100 ingested datasets, while taking specific ownership of high-value products including: Flagship SAR Data Mart: A unified bioactivity warehouse merging commercial and public (e.g., ChEMBL) databases with internal assay data. Commercial Vendor Data Mart: A massive catalog of purchasable compounds used to guide our internal compound design tools and tractability assessments. Biomedical Knowledge Graph: The critical data feeds and infrastructure that power our semantic graph and associated AI agents, linking targets, diseases, and compounds. Chemical Synthesis Data: The foundational dataset of chemical reactions used for training retrosynthesis models and tractability prediction. Patent Intelligence System: A pipeline transforming patent feeds and competitor data into actionable intelligence. Compound Standardization Registry: A large-scale chemical structure warehouse ensuring consistency across billions of compounds (similar to UniChem). What You’ll Do Pipeline Ownership at Scale: Act as a key owner for our core bioactivity pipeline, processing 75M+ records and managing ~100 distinct data feeds . You will navigate complex logic and orchestration, including managing 4000+ lines of complex SQL with 20+ transformation steps. Scientific Data Standardization: Resolve ambiguity by reconciling heterogeneous data formats from diverse commercial and public sources. You will design and implement logic to standardize chemical structures (SMILES, InChI, tautomers), biological targets (UniProt mapping, gene families, species homology), and assay data (IC50/Ki normalization, unit conversion). Engineer for Distributed Compute: Optimize tasks using Python and Snowpark for heavy-lifting operations, such as large-scale text mining (extracting dose/concentration from unstructured text) and molecular property calculation. Drive Data Quality: Implement rigorous data quality frameworks (DQF) to handle the nuance of biological data, ensuring our downstream models are trained on clean, semantic-aware data. Cross-Functional Consulting: Interface directly with discovery scientists to understand their diverse data needs and translate complex scientific requirements into robust engineering solutions. The Experience You’ll Need Core Engineering: Advanced SQL Warehousing: Deep expertise in modern cloud data warehousing (e.g. Snowflake, BigQuery). You should be comfortable with complex window functions, CTEs, and schema design for multi-layer environments. Python Distributed Compute: Strong proficiency in Python for data processing. Experience with Data warehouses is a huge plus, but general distributed processing experience is also valuable. Orchestration: Experience managing complex DAGs and asy ... (truncated, view full listing at source)

Apply Now

Direct link to company career page

More jobs atRecursion

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card