Job Description
Your work will change lives. Including your own.
Recursion is a leading, clinical-stage TechBio company decoding biology to industrialize drug discovery. Central to its mission is the Recursion Operating System (OS), a platform built across diverse technologies that continuously expands one of the world’s largest proprietary biological, chemical and patient-centric datasets. Recursion leverages sophisticated machine-learning algorithms to distill from its dataset a collection of trillions of searchable relationships across biology and chemistry unconstrained by human bias. By commanding massive experimental scale—up to millions of wet lab experiments weekly—and massive computational scale—owning and operating one of the most powerful supercomputers in the world—Recursion is uniting technology, biology, chemistry and patient-centric data to advance the future of medicine.
In this role, you will:
Build, scale, and operate a data platform . You will be a member of the platform team responsible for building, operating, and tuning a data platform that allows users to discover and query across the breadth of our data at Recursion, which includes a chemistry library of billions compounds, petabytes of cellular microscopy images taken in millions of different experimental contexts, and millions of assay results, all supporting Recursion’s drug discovery.
Build relatability into a heterogeneous dataset. At Recursion, we generate datasets based on a wide swath of diverse biological models and treatment approaches. You'll work with biologists, chemists, and data scientists to build relatability and query-ability into these datasets so they can be used in the future to answer the sorts of questions we haven't even thought of asking yet.
Act as a mentor, coach, and sponsor. You will share your technical knowledge and experiences, delivering impact, learning, and growth across teams at Recursion.
The Team You’ll Join
You will join the Data Lake team that built and maintains our Data Lake/house. The team is responsible for relational and object storage and has the motto:
all data flows to the Data Lake . The team solves the problem of making our diverse data discoverable, queryable, and relatable across datasets while we continue to add new data modalities as we grow. This will require collaboration with many different groups including teams building out reports, dashboards, and applications, teams finding and generating the required data for machine learning problems, and teams building and iterating on new data processing pipelines.
The Experience You’ll Need
5+ years of deep experience in modern, cloud-based data engineering. You have a proven track record of building and maintaining robust platforms that enable the discovery, query, and processing of large-scale datasets.
Expertise in Core Technologies (Mandatory): You possess advanced proficiency in Python and SQL and experience with Containerization (Docker, Kubernetes) , Infrastructure as Code (Terraform, etc.) , and Agentic Development (Cursor, Claude, etc.)
Data Fundamentals Architecture: You have a deep understanding of Relational Databases (Postgres, MySql, etc.) and are proficient in working with data container files (e.g., Parquet, Avro, etc.) . You are familiar with Medallion Architecture (Bronze/Silver/Gold) and know how to apply its principles to build scalable, reliable data products.
Cloud Platform Operations: You are seasoned in at least one major cloud provider ( GCP, AWS, or Azure ). You are comfortable in a "DevOps" capacity—specifically managing the infrastructure and tools that empower other engineers to run their pipelines, rather than focusing solely on building ETL flows.
Hands-On:
You understand the trade-offs between different architectures (Data Lake vs. Warehouse) and can make high-level decisions regarding CI/CD and system maintenance. Crucially, you are a "doer" who enjoys getting into the weeds of implementation, not only des ... (truncated, view full listing at source)