Senior Site Reliability Engineer
CyberSentriq
- Galway
- Permanent
- Full-time
- Design, implement, and maintain resilient and scalable infrastructure solutions for our microservice-based SaaS platform using Kubernetes, AWS, Terraform, and other relevant technologies.
- Automate deployment, configuration, and management of services and applications using infrastructure as code principles.
- Develop and maintain CI/CD pipelines to enable efficient and reliable software delivery.
- Implement monitoring, alerting, and logging solutions to ensure proactive identification and resolution of issues.
- Collaborate with cross-functional teams to optimize system performance, reliability, and security.
- Participate in incident response and post-incident analysis to identify root causes and implement preventive measures.
- Participate in on-call rotations for the SaaS product.
- Bachelor’s degree in Computer Science, Engineering, or related field.
- Previous and relevant experience in a similar role, with a strong focus on managing microservice-based SaaS platforms.
- Proficiency in Kubernetes for container orchestration and management.
- Previous experience with AWS (or other cloud provider) services, including EC2, S3, EKS, and IAM.
- Solid understanding of infrastructure as code principles and hands-on experience with Terraform for provisioning and managing cloud resources.
- Experience deploying and supporting a Large Language Model (LLM) in production a plus.
- Experience with GitOps practices and tools, particularly ArgoCD.
- Programming skills in Golang for developing integrating tools with GoLang based microservices a big plus.
- Familiarity with CockroachDB or other distributed databases is a plus.
- Excellent problem-solving and troubleshooting skills, with a proactive and results-oriented approach.
- Strong communication, collaboration, and documentation skills.