Site Reliability Engineer
StepStone Group View all jobs
- Dublin
- Permanent
- Full-time
- Design, build, and maintain scalable cloud infrastructure on AWS, GCP, or Azure, with a focus on high availability and fault tolerance.
- Collaborate with software engineers to embed reliability best practices into the software development lifecycle.
- Participate in on-call rotations, lead incident response, and conduct thorough post-mortems to prevent recurrence.
- Develop and maintain infrastructure-as-code (IaC) using tools such as Terraform, and CloudFormation
- Optimize cloud resource utilization and cost management across multi-cloud or hybrid environments.
- Contribute to the design and improvement of CI/CD pipelines and deployment automation.
- Ensure cloud environments adhere to financial industry security and compliance standards.
- Document systems, runbooks, and processes to support team knowledge sharing.
- 3-5 years of experience in a Site Reliability Engineering, DevOps, or Cloud Infrastructure role.
- Hands-on experience with one or more major cloud providers: AWS, GCP, or Azure.
- Proficiency in at least one scripting or programming language (Python, Go, Bash, etc.).
- Experience with infrastructure-as-code tools (Terraform, CloudFormation, Bicep).
- Strong understanding of networking fundamentals (DNS, TCP/IP, load balancing, VPNs).
- Familiarity with containerization and orchestration technologies (Docker, Kubernetes).
- Experience with observability tools such as Datadog, Prometheus, Grafana, or equivalent.
- Solid understanding of Linux/Unix systems administration.
- Experience in the financial services industry or other highly regulated environments.
- Familiarity with compliance frameworks such as SOC 2 or ISO 27001.
- Cloud certifications (AWS Solutions Architect, GCP Professional Cloud Architect, Azure Administrator, etc.).