Senior ML Ops Engineer (Remote – Anywhere)
Overview
The client seeks a Senior ML Ops Engineer to drive the development and deployment of an AI inferencing platform designed for EU-compliant infrastructure. The role begins with delivering a proof of concept to validate the management of Nvidia Enterprise AI through GCP while provisioning compute resources across EU-based data centers (e.g., Genesis Cloud). Longer term, this position will extend into building billing infrastructure, supporting 24/7 operations, and enabling customer success workflows.
Responsibilities
- Lead the design, deployment, and optimization of an AI inferencing platform on Nvidia Enterprise AI.
- Build and validate a proof of concept integrating GCP orchestration with EU-based compute providers.
- Implement and oversee monitoring, alerting, and performance optimization for large language model workloads.
- Contribute to the development of API-driven billing systems.
- Establish best practices for hybrid EU infrastructure deployments.
- Provide leadership and mentorship to junior engineers in ML Ops practices.
- Collaborate with stakeholders to design processes for 24/7 operations (NOC) and customer success workflows.
Required Skills & Experience
- Proven expertise in Nvidia Enterprise AI and DGX systems.
- Hands-on experience with TensorFlow and PyTorch for large-scale inferencing.
- Strong background in ML Ops orchestration and performance monitoring.
- Familiarity with GCP and EU-based infrastructure providers (e.g., Genesis Cloud).
- Experience implementing API billing frameworks.
- Track record of supporting or designing always-on systems (24/7 operations).
- Ability to operate as a senior ML Ops lead, guiding junior engineers and aligning infrastructure with product goals.
Tech Stack Exposure
- Nvidia Enterprise AI
- DGX systems
- GCP
- Genesis Cloud
- TensorFlow & PyTorch
- Monitoring/alerting tools
- API billing frameworks
Contract Details
- Duration: Proof of concept (4–6 weeks), with potential to transition into an ongoing SLA for platform scaling and support
- Location: Remote (anywhere)
About Tribes
Tribes partners with forward-thinking companies to bring transformative technical projects to life. We are committed to fostering a workplace culture rooted in diversity, equity, and inclusion, ensuring opportunities for talent from all backgrounds to thrive.
Required skills
Apply now
Sign in or register for your free Tribes Developer Account where you’ll be able to apply for this role and many others.