
Senior Observability Engineer
- Southside Dublin
- Permanent
- Full-time
Position Title: Senior Observability Engineer (Splunk + OSS)
Location: Dublin, Ireland
Experience: 12 - 15 years
Employment Type: Full-TimeRole Summary
We are looking for a high-impact Senior Observability Engineer to lead the design, implementation, and operationalization of end-to-end observability practices across a complex, enterprise-scale environment. You will be responsible for defining telemetry standards, building data pipelines, integrating observability into CI/CD and runtime environments, and enabling engineering and SRE teams with actionable insights.
This role demands deep expertise in both Splunk Observability Cloud (including SignalFx, APM, Log Observer) and Open Source tools like OpenTelemetry, Prometheus, Grafana, and Jaeger. The ideal candidate thrives in fast-paced, high-expectation environments, and is passionate about instrumenting everything.Key Responsibilities * Design & Implementation
- Architect and deploy observability pipelines from scratch, across metrics, logs, traces, and events
- Define observability standards (naming conventions, tags, golden signals, telemetry SLIs/SLOs)
- Design telemetry ingestion strategies using OpenTelemetry Collectors, Splunk Forwarders, Fluent Bit, etc.
- Tooling & Integration
- Configure and maintain Splunk Observability Cloud:
- SignalFx (Infrastructure Monitoring)
- APM (distributed tracing)
- Log Observer & RUM
- Implement and tune OSS tools like:
- Prometheus (metrics)
- Grafana (dashboards)
- Loki / ELK (logs)
- Jaeger/Zipkin (tracing)
- Integrate observability into CI/CD pipelines for real-time feedback on performance and reliability
- Alerting & Dashboards
- Build actionable detectors, dashboards, and alert rules that reduce noise and improve MTTD/MTTR
- Develop SPoG dashboards for leadership, engineering, and SRE stakeholders
- Correlate telemetry signals to enable deep root cause analysis
- Operationalization & Automation
- Automate deployment and updates of telemetry agents across fleets using Ansible, Helm, Terraform, etc.
- Implement observability-as-code practices using GitOps principles
- Maintain robust documentation and reusable templates/playbooks
- Maturity Enablement
- Help establish and measure observability maturity models across teams
- Conduct enablement workshops, onboarding sessions, and evangelize observability-first practices
- Partner with platform, DevOps, and SRE teams to embed telemetry into every service lifecycle
- Required Skills & Experience
- 7+ years of experience in SRE/DevOps/Platform Engineering roles with 3+ years specializing in observability
- Hands-on expertise in Splunk Observability Cloud (SignalFx, APM, RUM, Log Observer)
- Deep familiarity with OSS observability tools:
- OpenTelemetry SDK/Collector
- Prometheus, Grafana, Splunk ITSI, Loki
- Jaeger, Zipkin, Fluent Bit
- Strong skills in SPL (Splunk Processing Language), alert tuning, detector rules, and dashboarding
- Knowledge of cloud-native environments: Kubernetes, Docker, AWS/GCP/Azure
- Programming/scripting skills in Python, Go, or Shell
- Experience deploying observability components via Terraform, Helm, Ansible, etc.
- Splunk certifications (e.g., Observability Specialist, Admin, Consultant)
- Experience working in high-regulation industries (banking, healthcare, etc.)
- Familiarity with Service Mesh (Istio, Linkerd), Kafka, and event-driven observability
- Exposure to AIOps tools and anomaly detection models
- Strong ownership and a builder mindset
- Excellent communication and stakeholder alignment skills
- Ability to work under pressure and deliver in aggressive timelines
- Passion for enabling self-service observability across the org