Senior Systems Reliability Engineer I

Mode
India - BangalorePosted 30 March 2026

Job Description

About Us: ThoughtSpot is an AI-powered analytics platform that enables users to explore and analyze data through natural language queries, making insights accessible to all. Our mission is to deliver reliable, high-performing applications that empower our customers. The Role As part of the ThoughtSpot SRE team, you will be on the cutting edge of operational intelligence. You will not only ensure service reliability but also act as a trusted partner for our customers — proactively leveraging AI/ML to deliver timely updates, meaningful solutions, and predictive improvements. You are the bridge between our customers and engineering, combining deep systems expertise with a genuine passion for customer success. If you thrive in dynamic environments and are committed to building resilient, self-optimizing systems, this role is for you. What You'll Do: Technical & Customer Support Act as the primary point of contact for customer-facing technical issues related to our SaaS platform, including data connectivity, report errors, performance concerns, access problems, data inconsistencies, software bugs, and integration challenges. Understand and empathize with the challenges ThoughtSpot users face, offering tailored solutions to improve their experience. Provide timely, accurate, and clear updates to customers, consistently meeting SLAs and driving issues through to full resolution via tickets and calls. Translate complex technical issues into clear, concise updates for both technical and non-technical stakeholders. Create and maintain knowledge-base articles to empower customer self-service and improve support efficiency. System Reliability & Monitoring Maintain, monitor, and troubleshoot ThoughtSpot cloud infrastructure using tools like Grafana, Prometheus, Datadog, and Splunk. Monitor system health and performance through metrics, logs, and dashboards to detect and prevent issues proactively. Implement and leverage AI/ML-driven solutions for proactive observability, predictive anomaly detection, and intelligent alerting to enhance service reliability and reduce Mean Time to Resolution (MTTR). Understand and apply NetOps and SecOps principles for cloud and on-premise deployments. Develop and implement automation and best practices to streamline operations and strengthen system reliability. Optimize SRE workflows with AI tools to boost operational effectiveness. Incident Management & Continuous Improvement Participate in on-call rotations, lead incident reviews, and conduct thorough root cause analyses to drive continuous improvement. Work cross-functionally with Engineering to define and implement tools that enhance debuggability, supportability, availability, scalability, and performance. Be an expert in both cloud and on-premise infrastructure by developing automation and best practices. What You'll Bring: B.S. in Computer Science or equivalent relevant experience. Proven experience troubleshooting complex Linux systems and managing virtualization and cloud platforms (VMware, AWS, Azure, GCP). Hands-on experience with monitoring tools such as Grafana, Prometheus, Datadog, or Splunk. Demonstrated experience and a keen interest in leveraging AI/ML principles to address SRE challenges — including AIOps, predictive maintenance, and intelligent automation. Prior experience in enterprise customer support, including on-call rotations and incident management, with the ability to lead root cause analyses. Strong problem-solving and algorithmic thinking with a solid understanding of system internals. Excellent verbal and written communication skills with the ability to work independently and cross-functionally in fast-paced environments. Familiarity with scripting and programming languages such as Python, Go, Bash, or Java. Exposure to infrastructure and service monitoring frameworks with the ability to analyze data to ensure high availability. Good to Have: Experience partnering with Engineering to design and ... (truncated, view full listing at source)
Apply Now

Direct link to company career page

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card

Share