Senior Backend Engineer, Data Modeling and Ingestion Platform
UdioNew York$180k – $220kPosted 5 March 2026
Job Description
<h2><strong>About the Role </strong></h2>
<p>We are looking for a Senior Backend Engineer to lead the unification of <strong>large, highly rich, and heterogeneous datasets</strong> sourced from a wide range of external providers. These datasets are used to power our generative audio models. </p>
<p>Your work will create the foundational dataset that powers our research by building robust, scalable systems for <strong>linking, deduplicating, reconciling, and enriching </strong>data at massive scale. This role centers on <strong>high-impact bulk ingestion and advanced data linkage</strong>. You will design the logic, algorithms, and strategies that transform many independent datasets into a unified, high-quality canonical asset used throughout the company.</p>
<p>You will collaborate closely with ML researchers and product teams, working with tools such as <strong>BigQuery, Dataflow/Beam, TFRecords</strong>, and—where beneficial—distributed systems frameworks like <strong>Ray</strong>. Familiarity with ML workflows using <strong>JAX</strong> or <strong>multihost training</strong> is a plus, as the datasets you produce will directly support that ecosystem.</p>
<h2>What You'll Do</h2>
<ul>
<li>Build high-throughput <strong>bulk ingestion workflows</strong> to integrate datasets from multiple external providers. </li>
<li>Design and implement scalable <strong>entity-resolution</strong> solutions, including record linking, deduplication, clustering, and conflict arbitration. </li>
<li>Create and refine <strong>matching logic, decision rules, and similarity functions</strong> to align datasets with high accuracy and strong coverage. </li>
<li>Define and track <strong>data quality indicators</strong>, such as overlap metrics, match precision/recall, duplicate rates, and completeness. </li>
<li>Prepare training-ready datasets in formats such as <strong>TFRecords</strong>, and structure data to meet ML research requirements. </li>
<li>Develop processing components using <strong>Dataflow (Beam)</strong> and manage large analytical workloads in <strong>BigQuery</strong>. </li>
<li>Leverage frameworks like <strong>Ray</strong> to accelerate large-scale experiments, feature extraction, and research-oriented data preparation. </li>
<li>Collaborate with ML researchers to anticipate downstream requirements and evolve linkage strategies as new sources and use cases emerge. </li>
</ul>
<h2>What We're Looking For </h2>
<ul>
<li>Experience working with <strong>large, heterogeneous datasets </strong>from multiple providers or domains. </li>
<li>Strong background in <strong>entity resolution</strong>, deduplication, data unification, or related large-scale data integration techniques. </li>
<li>Proficiency in <strong>Python</strong>, with an emphasis on efficient, scalable data processing. </li>
<li>Experience with <strong>BigQuery, Google Dataflow/Apache Beam</strong>, or similar batch-processing frameworks. </li>
<li>Familiarity with <strong>data validation, normalization, reconciliation</strong>, and building consistent views across diverse data sources. </li>
<li>Ability to craft well-structured <strong>matching and decision strategies</strong> that balance accuracy, completeness, and computational efficiency. </li>
<li>Comfortable iterating quickly on pragmatic solutions, balancing correctness with time-to-delivery. </li>
<li>Clear communication skills and the ability to collaborate closely with ML and research teams. </li>
</ul>
<h2> Nice to Have</h2>
<ul>
<li>Knowledge of architecting <strong>Google Cloud Platform</strong> systems at scale</li>
<li>Experience with distributed compute frameworks such as <strong>Ray</strong>, <strong>Spark</strong>, or <strong>Flink</strong>. </li>
<li>Understanding of <strong>JAX-based ML pipelines</strong>, <strong>multihost training setups,</strong> or large-scale data preparation for accelerator-backed workflows. </li>
<li>Familiarity with <strong>TFRecords</strong> or other high-volume training data format ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
More jobs at Udio
See all →Head of Artist Partnerships
Los Angeles (preferred) or New York City · 5 March 2026
Member of Technical Staff - Quantitative Research
New York City (Remote possible for exceptional candidates) · 5 March 2026
Senior Backend Engineer, Product and Infra
New York · 5 March 2026
Senior Product Designer
New York City (Preferred) / Open to Remote · 5 March 2026
More React jobs
See all →AI Tooling Frontend Engineer - Helix Team
Figure AI · San Jose, CA
Intermediate Software Developer, Full Stack
Hootsuite · Vancouver, British Columbia, Canada
Sr. Backend Software Engineer, Fraud Risk Platform
Navan · Dallas, TX
Sr. Backend Software Engineer, Fraud Risk Platform
Navan · Palo Alto, CA or San Francisco, CA