Job Description
About Judi Health
Judi Health is an enterprise health technology company providing a comprehensive suite of solutions for employers and health plans, including:
Capital Rx , a public benefit corporation delivering full-service pharmacy benefit management (PBM) solutions to self-insured employers,
Judi Health™ , which offers full-service health benefit management solutions to employers, TPAs, and health plans, and
Judi® , the industry’s leading proprietary Enterprise Health Platform (EHP), which consolidates all claim administration-related workflows in one scalable, secure platform.
Together with our clients, we’re rebuilding trust in healthcare in the U.S. and deploying the infrastructure we need for the care we deserve. To learn more, visit www.judi.health .
Location: Remote
Position Summary :
Our
Scalability team as a Senior Scalability Engineer focused on observability platform development and engineering productivity.
In this role, you will define, own, and build Judi Health's organization-wide observability strategy, tooling, and platform products. Beyond
maintaining
infrastructure,
you'll
architect and develop a custom observability platform that gives engineering teams powerful, fast, and cost-effective visibility into every layer of our infrastructure—from application logs and metrics to distributed traces.
You'll
build production-grade internal products using React/TypeScript frontends with Python and Rust backends, creating tools that fundamentally improve how engineers at Judi Health debug, monitor, and
optimize
their systems. Working closely with leadership and cross-functional teams, your work will be foundational to platform stability, performance optimization, and developer productivity across our rapidly growing healthcare platform.
Position Responsibilities:
In this role, you'll own the observability infrastructure that powers our engineering organization. You will:
Architect observability platform: Design, implement, and maintain the LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus) as the primary observability platform across all engineering teams, making architectural decisions that balance cost, performance, and developer experience.
Build internal observability products: Design and develop production-grade internal platform products with React/TypeScript frontends and Python/Rust backends that provide engineers with powerful log search, metrics visualization, and trace analysis capabilities.
Develop custom log indexing systems: Architect and build high-performance log indexing solutions using Rust that process logs and provide sub-second search across billions of log lines at a fraction of the cost.
Integrate SQL analytics for logs: Design and implement solutions leveraging AWS Athena or similar SQL query engines (DuckDB, ClickHouse) for ad-hoc log analysis and historical queries, enabling engineers to run complex SQL queries over S3-based log data for deep investigations and trend analysis.
Create advanced query interfaces: Build sophisticated web interfaces that allow engineers to query logs, metrics, and traces with features like saved queries, query templates, correlation analysis, and pattern detection, supporting both full-text search and SQL-based analytics.
Balance cloud-native and open-source: Architect solutions that thoughtfully leverage both AWS-managed services (CloudWatch, Athena, Kinesis) and open-source tooling (LGTM stack, Quickwit) to optimize for cost, performance, and operational flexibility based on use case requirements.
Integrate AWS observability: Design seamless integration between AWS CloudWatch Logs/Metrics and our custom observability platform, providing unified visibility across managed and self-hosted infrastructure.
Build intelligent alerting: Develop smart dashboards, monitors, and alerting systems that reduce noise, detect anomalies, and help teams respond to incidents quickly.
Partner with engineering teams: Work directly with product tea ... (truncated, view full listing at source)