Staff Software Engineer - AWS Cloud Architect + AI (MLOps)
Analog Devices View all jobs
- Limerick
- Permanent
- Full-time
- Lead cross-team initiatives that elevate operational excellence across the entire ML/AI ecosystem.
- Define org-level MLOps standards, best practices, and architectural patterns adopted across multiple engineering groups.
- Drive long-range improvements to platform maturity through automation, standardization, and advanced operational engineering.
- Create predictive, systemic cloud and ML workload strategies optimizing cost, performance, and resource allocation across teams.
- Anticipate operational, architectural, and organizational risks, implementing durable solutions adopted across teams.
- Architect multi-team AWS ML/AI infrastructure and define long-term reference architectures for platform evolution.
- Establish enterprise-level governance for model lifecycle, provisioning, drift detection, and compliance.
- Define org-wide standards for GenAI/LLM/Agent evaluation and experimentation.
- Own Kubernetes strategy across ML/AI infrastructure, defining cluster topologies, orchestration patterns, and reliability standards.
- Build reusable IaC frameworks and Terraform/CDK modules adopted across multiple teams.
- Mentor engineers across orgs on optimization, reliability, distributed workflows, and ML system design.
- Partner with senior engineering, science, and product leadership to steer ML/AI platform roadmap and strategy.
- Recognized expert in ML/AI platforms and MLOps, with influence extending across orgs or business units.
- Proven ability to architect cross-team solutions and define long-term platform directions.
- Expertise creating scalable model governance, registries, and compliance standards used org-wide.
- Deep experience with ML observability frameworks and defining SLO/SLA measurement strategies.
- Expert-level distributed systems experience, including Ray, large-scale LLM/Agent pipelines, and scalable RL systems.
- Mastery of IaC/GitOps frameworks and ability to build reusable multi-team infrastructure patterns.
- Leadership experience in defining Kubernetes and workflow orchestration strategies across orgs.
- Deep experience architecting enterprise ML/AI support for foundation models and multimodal AI.
- Strong ability to lead large multi-team roadmap initiatives, influence cross-functional decision-making, and create long-range architectural direction.
- Executive-level communication and influence skills.
- Strong mentoring orientation and ability to upskill teams across the organization.
- Deep experience building or scaling ML infrastructure for robotics systems, including deployments across diverse environments or tasks.
- Expertise with ROS/ROS2, including architecting reusable ROS components, integrating ML inference pipelines, and supporting heterogeneous robot fleets.
- Experience designing ML systems that support adaptive, multi-skilled robots for environments where tasks change frequently (modular, batch-of-one manufacturing).
- Experience integrating ML pipelines with robot perception, calibration, planning, or control systems.
- Strong background applying Vision-Language Models (VLMs) for industrial perception, scene understanding, or few-shot classification in robotics contexts.
- Experience designing ML infrastructure enabling low-data adaptation, task generalization, or rapid skill-transfer for robots.
- Familiarity with robotic simulation toolchains and workflows, including sim-to-real strategies and dataset generation pipelines.
- Experience influencing robotics-engineering teams and shaping system architecture that bridges manufacturing domain expertise with ML/AI capabilities.
- Demonstrated ability to architect platform-level solutions enabling scalable robotics ML deployment across multiple teams/products.