Production Support Engineer II

Global Healthcare Exchange Inc
Hyderabad, Telangana, IndiaPosted 9 March 2026

Job Description

Production Support Engineer Position Summary The Production Support Engineer is responsible for first-line incident response, initial triage, basic troubleshooting, and operational support for production and integration environments. With 4–5 years of experience, this role provides frontline support, identifying issues, documenting symptoms, and escalating to Tier 2/3 teams as needed. This will report to the Manager, ITSM. The engineer will monitor systems, respond to alerts, capture diagnostics, and ensure timely communication and resolution for routine incidents. This position participates in on-call rotations and supports 24x7 availability of critical enterprise systems. Key Responsibilities Incident Management First-Line Troubleshooting •Serve as the first responder for production alerts and incidents across Linux, Windows, and application environments. •Perform initial triage, validate issues, gather logs, document symptoms, and escalate when necessary. •Classify incidents by severity and route to SRE, DevOps, Engineering, or Product teams as appropriate. •follow established runbooks to resolve common or known issues. •Participate in on-call rotation for first-response incident handling. Monitoring System Oversight •Monitor production and integration environments using tools such as New Relic, Datadog, CloudWatch, Graylog, PagerDuty, PRTG, and Kibana. •Acknowledge alerts promptly and take initial troubleshooting steps before escalation. •Perform basic administrative tasks under guidance, such as restarting services, validating connectivity, or checking logs. •Ensure monitoring tools are reporting accurately and escalate anomalies. Maintenance Release Support •Assist with maintenance windows, deployment validations, and basic post-release checks. •Perform smoke testing and ensure system readiness before handing off to Tier 2/3 teams. •Follow operational standards for configuration checks, connectivity validation, and version verification. Automation Process Improvement •Identify repetitive Tier-1 tasks and suggest opportunities for automation. •Execute simple scripts (Python, Bash, PowerShell) for diagnostics or routine workflows. •Provide feedback to improve alert quality, runbooks, and support processes. Documentation Knowledge Management •Maintain and update runbooks, troubleshooting steps, and knowledge base articles. •Document shift handoffs clearly and ensure accurate communication for ongoing issues. •Support Knowledge-Centered Support (KCS) practices. Collaboration Communication •Communicate incident status, updates, and resolutions clearly to internal teams. •Work within Jira, Salesforce, or similar ticketing systems for tracking and customer communication. •Remain active in collaboration channels (Slack, Teams) for real-time coordination. •Escalate systemic issues to higher-level teams with clear context and logs. Qualifications Experience Required •4–5 years of experience in production support, NOC operations, or IT service delivery. •Strong foundational knowledge of Linux and Windows environments. •Basic troubleshooting skills in Java, .NET, or Python-based applications. •Experience with monitoring tools such as New Relic, Datadog, CloudWatch, Graylog, PagerDuty, or PRTG. •Proficiency with ticketing tools (Jira, Salesforce). •Clear communication skills with ability to follow structured escalation processes. •Availability for 24x7 support rotations and after-hours response when needed. Preferred •Certifications in Linux, AWS, CompTIA, or CCNA. •Exposure to scripting/automation (Python, Bash, PowerShell). •Familiarity with cloud platforms such as AWS. •Knowledge of Incident, Change, and Problem Management (ITIL). •Experience supporting CI/CD pipelines or modern application stacks. Technical Skills (familiarity with a subset preferred) •Cloud: AWS (basic understanding) •Languages Scripting: Python, Bash, PowerShell (basic to intermediate) •Application Servers: Apache Tomcat, IIS, JBoss (basic troubleshooting) ... (truncated, view full listing at source)