Senior / Principal DevOps Engineer: Automation
Company Overview:
Ori is setting a new standard for how AI worlds are built. We are the first AI Infrastructure provider with the native expertise, comprehensive capabilities, and end-to-end flexibility to support any model, team, or scale. As a fast-growing startup backed by leading investors, we value ambition, accessibility, and collaboration, and are committed to pushing the boundaries of what’s possible in the field of AI. Join our close-knit, global team and help us build the future of AI infrastructure!
Job Description
As a Senior DevOps & Network Automation Engineer, you’ll design, build, and scale the automation frameworks that make our infrastructure repeatable, observable, and self-healing. You’ll focus on deep networking automation — from provisioning and configuration to monitoring and optimization using modern IaC tools like Ansible, Terraform, and Python/Go-based automation frameworks.
This role blends DevOps engineering, network systems, and infrastructure reliability - perfect for someone who wants to shape how AI infrastructure is deployed and managed at scale.
What You’ll Do:
Infrastructure Automation & Tooling
- Build and maintain automated provisioning frameworks for compute, network, and storage systems using Ansible, Terraform, and related tooling.
- Build and maintain automated provisioning frameworks for compute, network, and storage systems using Ansible, Terraform, and related tooling.
- Develop reusable playbooks, roles, and modules to standardize infrastructure delivery and reduce manual configuration drift.
- Integrate automation workflows into CI/CD pipelines for consistent deployment of infrastructure and configuration changes.
Network Engineering & Automation
- Design and automate Layer 2–7 network configuration and lifecycle management (routing, VLANs, EVPN/VXLAN, firewalls, load balancers, DNS, etc.).
- Develop Ansible automation for network devices and software-defined infrastructure (SDN, SR-IOV, BGP, virtual switching).
- Implement automated network validation, testing, and telemetry collection pipelines.
- Contribute to the evolution of Ori’s multi-tenant, multi-cloud network fabric.
DevOps & Platform Reliability
- Contribute to the evolution of Ori’s multi-tenant, multi-cloud network fabric.
- Implement CI/CD pipelines that validate and deploy infrastructure changes safely and predictably across environments.
- Develop observability pipelines for infrastructure health — metrics, logs, traces — using Prometheus, Grafana, and open telemetry tools.
- Contribute to reliability engineering practices: automated rollbacks, health checks, drift detection, and self-repair logic.
Systems Integration & Operations
- Collaborate with software engineering, platform, and product teams to design and deliver end-to-end automation across Ori’s clusters.
- Integrate infrastructure automation with Kubernetes, bare-metal provisioning, and hybrid-cloud environments.
- Maintain and evolve the operational toolchain for configuration management, monitoring, and lifecycle orchestration.
Architecture & Continuous Improvement
- Evaluate emerging DevOps and network automation tools, identifying opportunities for integration or migration.
- Drive standardization of infrastructure-as-code practices across the organization.
- Develop architectural and operational documentation for repeatability, auditability, and onboarding.
- Lead technical discussions and mentor engineers in automation, networking, and DevOps best practices.
What You Bring:
- 7+ years of experience in DevOps, infrastructure automation, or network engineering roles.
- Deep hands-on experience with Ansible, Terraform, GitOps pipelines (ArgoCD, Flux, or similar).
- Strong scripting or development skills in Python, Go, or Bash for automation workflows.
- Proven expertise with networking concepts and tools - TCP/IP, BGP, DNS, VLANs, EVPN/VXLAN, SR-IOV, routing, and switching.
- Experience automating physical or virtual infrastructure across bare-metal and cloud environments (AWS, GCP, Azure).
- Solid understanding of Linux systems administration, CI/CD pipelines (GitHub Actions, etc), and observability stacks.
- Familiarity with containerization (Docker) and Kubernetes (CNI plugins, ingress, service mesh).
- Comfort leading large-scale automation initiatives and mentoring teams in IaC principles.
Preferred Skills (Nice to Have):
- Proficiency in Kubernetes networking, CNIs, ingress controllers, and service meshes.
- Familiarity with operators, CRDs, and custom controllers for managing infrastructure state.
- Experience developing or extending Kubernetes CNIs or operators.
- Understanding of eBPF networking or programmable data planes.
- Familiarity with NVIDIA BlueField DPUs or SmartNIC offload architectures.
- Prior work in HPC, telco, or hyperscale network environments.
- Contributions to open-source network automation or cloud-native projects.
What you’ll bring:
- Automation-first mindset: You treat infrastructure like code and toil as a bug.
- Systems thinking: You understand how networking, compute, and storage interlock — and automate accordingly.
- Pragmatic leadership: You can design elegant solutions but aren’t afraid to get your hands dirty debugging a failed playbook.
- Collaboration: You work cross-functionally with engineers, operators, and product managers to deliver stable, scalable systems.
- Adaptability: You can quickly learn and integrate new DevOps or networking tools as technology evolves.
Qualities we look for:
- Set the standard: Every single day, you spot opportunities to constructively shake things up.
- Inspire the change: There's no blueprint for the future. You’ll embrace challenges and change.
- You’re real and you’re true to yourself: We cherish and celebrate diversity so you’ll feel right at home whoever you are and whoever you’re talking to, you treat everyone the same.
Why should you join us?
What sets us apart is our blend of modern technology, competitive benefits, and an open, welcoming work culture that enables our people to thrive.
Here are just some of the great things you can expect from us:
- Remote work, flexible hours: we offer a fully remote work schedule, with flexible working hours and trust in your productivity, we are in sync with your team’s general locations and time zones to foster effective and seamless collaboration.
- 30 days of annual leave: we value your peace of mind. With 30 days off (excluding public holidays) and access to mental health resources, we make sure you're as strong mentally as you are professionally.
- A culture that emphasises results over hierarchy, process & ego: we place great emphasis on the quality, ingenuity and creativity of work.
- Open communication, regular feedback: we value smooth collaboration, direct and actionable feedback, and believe that leading with empathy and a growth mindset makes us better together.
- Learning Time: we all have dedicated learning time to focus on new skills, projects or interests that lay outside of your day-to-day job.
- Health & Wellbeing: we want everyone to feel healthy and happy, so we offer private medical insurance via Bupa.
- Cycle to Work Scheme: we're committed to building a sustainable business, so we encourage cycling to work.
- Gympass subscription to a variety of gyms and wellbeing apps
- Participation in the company shares program
- Enhanced parental pay & leave
Diversity, Equity, Inclusion and Belonging
We are an equal opportunity employer and we strive to reduce unconscious bias throughout our hiring process. All applicants will be considered for employment without attention to ethnicity, religion, sexual orientation, gender identity, family or parental status, national origin, veteran, neurodiversity status or disability status. To ensure our recruitment processes provide an equal opportunity for all applicants to succeed, we encourage you to let us know if there are any adjustments that we can make.
- Department
- Engineering
- Locations
- UK Remote Working, London, Europe
- Remote status
- Fully Remote
- Employment type
- Full-time