Staff Software Engineer — Search Platform, Ingestion & Indexing

RemotePosted 14 April 2026

Tech Stack

Python Express AWS Terraform CI/CD Elasticsearch DynamoDB Machine Learning AI LLM Fine-tuning Agents

Job Description

This posting is for proactive recruitment purposes and may be used to fill current openings or future vacancies within our organization. Overview of the Role Advanced Content Engineering (ACE) is seeking a Staff Software Engineer to serve as the technical anchor for the search platform’s ingestion and indexing systems. The platform processes millions of documents across TR’s legal, tax, and professional content corpora — parsing, chunking, enriching, embedding, and indexing them into a hybrid search engine that powers both human-facing search interfaces and autonomous AI agents. Getting this pipeline right, at scale, with zero-downtime operations and increasingly agentic retrieval patterns, is one of the platform’s most consequential engineering challenges. This role owns the design, implementation, and operational health of the document ingestion pipeline and search index management systems — from the Kafka-based streaming infrastructure that moves documents through processing stages, to the Vespa application architecture that stores and serves them. Staff Engineers on this team define, build, test, deploy, scale, and operate what they ship — full-stack ownership is not a principle we aspire to, it is the daily reality. AI-assisted development is the team norm, not the exception, and constant delivery to production is the expectation. This is a role for someone who sets architectural boundaries, not just executes within them About the Role In this position, you will focus on: Ingestion Pipeline Architecture & Engineering • Plan, design, develop, and own the end-to-end document ingestion pipeline — a Kafka-based stream processing architecture that moves documents through parsing, chunking, enrichment (entity extraction, embedding generation, metadata enrichment), and indexing stages — including all fault tolerance, version ordering, and at-least-once delivery guarantees • Architect and implement pluggable, configurable pipeline components (parsers, chunkers, enrichers, indexers) that client teams can assemble into custom topologies via the platform’s self-service APIs, while maintaining reliable, observable, and performant execution • Own the platform’s Protobuf-based document schema and schema registry integration — establishing schema governance standards, enforcing backward-compatible evolution, and ensuring reliable serialization across all pipeline stages • Design and implement dual-flow ingestion: a high-throughput batch path for full reindexing and a low-latency incremental path for real-time document updates, with strong guarantees around document version ordering and idempotent processing • Lead the migration of ingestion infrastructure from OpenSearch to Vespa, including design of Vespa document processors, custom Kafka feeders, and application package architecture — resolving complex technical challenges that have little or no precedent within the team Custom Model Operationalization • Own the end-to-end lifecycle for custom models integrated into the ingestion pipeline — re-ranking models, embedding models, and enrichment components — including inference serving behind a stable API surface, latency SLO management, hardware and runtime configuration (batching, quantization), and scaling • Build and operate the model promotion pipeline: the CI/CD workflow that moves a model artifact from the fine-tuning team through staging to production, including versioning, canary rollouts, and rollback mechanisms — ensuring the platform team can operate model updates independently without depending on the research team for production changes • Define and maintain integration contracts between custom models and downstream pipeline components — governing input/output schemas, compatibility requirements, and the governance process for model updates that ensures search pipeline consumers are not broken by changes upstream • Instrument model serving for production observability: latency distributions, throughput, error ... (truncated, view full listing at source)

Apply Now

Direct link to company career page

More jobs atThomson Reuters

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card