
Lead Site Reliability Engineer
- Cork
- Permanent
- Full-time
- Collaborates with Agile squads/developers, sustain and business partners and provides significant contributions to develop specifications to resolve problems, and to address enhancement needs focusing in areas of logging, monitoring and metrics for operational readiness
- Provides continuous feedback to development teams on system stability, defect analysis and system enhancements
- Work with IT business and development partners to gather input to develop new capabilities in displaying/monitoring/alerting on key performance indicators (KPIs) by tracking business transactions (BT) in real-time
- Plan for validation and verification of changes deployed by infrastructure teams, development teams.
- Establish and maintain a good relationship with team members, Product Development, Product management, Customer Service, Client management and other cross functional teams.
- Requires rotating shift work as needed.
- On-call rotation is required, as 7x24x365 support is required.
- Deep understanding of Linux systems
- Hands on experience with cloud infrastructure; Google, AWS or Azure a plus
- Experience with PaaS technologies such as Cloud Foundry, Kubernetes, Bosh.
- Experience with Continuous delivery tools like Ansible, Rundeck or Argo CD to setup automated pipelines as needed.
- Experience in supporting middle-ware technologies such as Apache, Tomcat, Spring.
- Experience with at least one scripting languages such shell, perl, python, javascripts, etc…
- Experience with APM tools such as Newrelic, Dynatrace or AppDyanmics.
- Experience with monitoring tools such as Zabbix or check_mk.
- Proven problem solving and analytical ability.