Analyst Site Reliability Operations Engineer
Salesforce View all jobs
- Dublin
- Permanent
- Full-time
- Lead incident response for high-severity incidents affecting internal business operations, acting as Incident Commander — establishing impact, coordinating Subject Matter Experts, making decisions under pressure, and driving resolution.
- Monitor enterprise systems including infrastructure, applications, and network components. Escalate anomalies and potential issues to senior team members before they impact users.
- Maintain clear and timely incident communications, including status updates for technical teams and stakeholders. Able to translate technical details into business-friendly language for leadership and the wider business.
- Actively participate in problem management activities — help investigate recurring incidents, document root cause analyses, and track known errors to support long-term resolution efforts.
- Follow and contribute to incident management processes including playbooks and SOPs. Identify opportunities to improve documentation and drive process improvements.
- Support coordination of changes and infrastructure updates during incident resolution. Work with cross-functional teams to maintain business continuity.
- Assist with the analysis of incident data and KPI metrics to identify trends and patterns. Contribute to post-incident reviews and drive action items through to completion.
- Participate in on-call rotation as part of regional coverage.
- 3–5 years in IT operations, service desk, or a related technical support role in a 24x7 high-availability environment.
- Demonstrated ability to lead high-severity incidents as Incident Commander — establishing impact, evaluating resolution scenarios, making decisions based on inputs from Subject Matter Experts, and communicating outcomes in both technical and business language.
- Strong written and verbal communication skills. Able to communicate complex technical information clearly to peers, stakeholders, and leadership.
- Good technical troubleshooting ability across Windows and/or Linux environments, common enterprise applications, and monitoring/logging tools.
- Solid understanding of ITSM concepts including incident, problem, and change management. ITIL Foundation certification (or willingness to obtain) preferred.
- Demonstrated experience in problem management, including investigation of recurring incidents, comprehensive root cause analysis documentation.
- Exposure to cloud platforms (e.g. AWS, Azure, GCP) or cloud fundamentals certifications.
- Experience with monitoring or logging tools such as New Relic, Splunk, or Grafana.
- Salesforce platform experience or certifications.
- Basic scripting knowledge in Python, Bash, or PowerShell.
- BS in Computer Science, Information Technology, or equivalent practical experience.