
Senior Site Reliability Engineer, Observability
- Ireland
- Permanent
- Full-time
- Define standards and vision for the mission-critical observability platform leveraged by all parts of the engineering organization
- Design, architect, build and deliver core pieces of our observability services in collaboration with other vested parties
- Design, implement, and troubleshoot the monitoring of services that seamlessly spans the globe - including several cloud providers
- Build for reliability, making services and infrastructure available, resilient, fault tolerant and self-healing
- Identify and configure key metrics to detect incidents and quantify service health, availability and performance.
- Participate in a week-long on-call rotation and blameless post-mortem process
- Improve our observability capabilities, optimizing for cost, ease of use, and maintainability
- Experience running mission critical services at scale
- Experience with observability of large scale distributed systems
- An understanding of information security issues
- Firm grasp of at least one modern programming language, beyond basic scripting
- Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc)
- Bachelor's degree in Computer Science or equivalent experience
- Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure)
- Experience working in a kubernetes-based environment kubernetes clusters
- Generous compensation package
- Opportunities to learn on the job (time to up skill in new technologies)
- High level of independence in your day to day work