Staff Reliability Engineer

Coupang
BengaluruPosted 26 March 2026

Job Description

Company Introduction We exist to wow our customers. We know we’re doing the right thing when we hear our customers say, “How did we ever live without Coupang?” Born out of an obsession to make shopping, eating, and living easier than ever, we are collectively disrupting the multi-billion-dollar commerce industry from the ground up and establishing an unparalleled reputation for being leading and reliable force in South Korean commerce. We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been at since our inception. We are all entrepreneurial surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day. Our mission to build the future of commerce is real. We push the boundaries of what’s possible to solve problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-on, high-tech, and hyper-connected world. Role Overview: To ensure stable Coupang's IT services, the IT Reliability Engineering team operates monitoring systems and processes for IT infra and applications. The team is responsible for ensuring and improving monitoring visibility. In the case of an event or incident, the team collaborates with the engineering team to resolve it and manage relevant metrics. To ensure the continuity of service, the team regularly conducts DR tests. Key Responsibilities: Strategic Vision Leadership Define and drive the observability strategy and roadmap, aligning with business and technology goals. Establish a mature observability framework covering infrastructure, network, applications, and end-user experience. Advocate for observability best practices across engineering, operations, and product teams. Monitoring Tool Implementation Lead the design, implementation, and optimization of observability platforms (e.g., Prometheus, Grafana, Datadog, New Relic, Splunk). Evaluate and onboard new tools and technologies to enhance visibility and telemetry across systems. Ensure scalable and resilient monitoring architectures are in place for hybrid and cloud-native environments. Gap Analysis Continuous Improvement Conduct gap assessments in existing monitoring setups and identify areas for improvement. Implement automated solutions to address low-hanging fruits and reduce manual overhead. Continuously refine monitoring configurations to improve signal-to-noise ratio and reduce alert fatigue. End-to-End Observability Build and maintain end-to-end visibility across infrastructure, network, applications, and user journeys. Integrate observability tools with incident management, ticketing, and reporting systems. Develop and enforce tagging strategies, metrics standards, and log enrichment practices. Collaboration Enablement Partner with DevOps, SRE, and application teams to embed observability into CI/CD pipelines and development workflows. Provide technical guidance and training to teams on observability toolsand practices. Support incident response and post-mortem analysis with automated diagnostics and telemetry insights. Data-Driven Insights Leverage observability data to generate actionable insights for performance tuning, capacity planning, and reliability engineering. Create dashboards and reports that provide meaningful visibility to stakeholders at all levels. Qualifications: Observability Monitoring Tools Prometheus, Grafana, Zabbix, SolarWinds Datadog, New Relic, Dynatrace, Splunk, Helix Open Telemetry (for standardized telemetry collection) Infrastructure Automation Terraform, Ansible, Puppet, Chef (IaC tools) Scripting languages: Python, Bash, PowerShell REST APIs: Experience integrating and automating obser ... (truncated, view full listing at source)
Apply Now

Direct link to company career page

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card

Share