MLOps Support Engineer
CloudFactory
Customer Service
Medellín, Medellin, Antioquia, Colombia
Posted on Jan 16, 2026
About the role:
The MLOps Support Engineer is an operations-first role, focused on ensuring AI/ML systems remain stable, observable, and supportable in production environments. This is not a data science or feature development role.
The primary objective is to maintain continuous performance of ML models and associated pipelines with minimal disruption to both internal and client-facing services. You will provide Tier 1 and Tier 2 support, escalating to Tier 3 Engineering as needed.
What you’ll do:
- Provide Tier 1 / Tier 2 operational support for AI/ML solutions.
- Identify failed jobs, degraded pipelines, or performance anomalies.
- Triage incidents, investigate issues, and coordinate escalation to Tier 3 Engineering.
- Participate in on-call rotas once established.
- Validate that pipelines and jobs complete successfully.
- Monitor data pipeline health, model execution, and basic performance metrics.
- Identify operational issues before they impact customers
- Respond or alert customers when there has been an outage or issue with one of their models.
- Support incident management, rollback, and recovery activities.
- Use and maintain runbooks and operational documentation.
- Work with Engineering to improve supportability and observability.
- Contribute to knowledge sharing to reduce single points of failure.
- Work within defined SLAs and support processes as the service matures
- Build quarterly business reviews to provide updates on the health of the ML Models.
- Evaluate champion/challenger models to see if a new model should be promoted.
- Monitor for model drift and performance degradation, while validating that updates (new champion models or added data) do not introduce bias.
Essential
- Experience in operations, DevOps, SRE, or platform support roles.
- Strong troubleshooting skills in production environments.
- Proficiency in SQL and scripting (Python, Bash) for developing and automating ML workflows.
- Familiarity with Cloud-hosted systems (AWS, GCP, Azure) for cloud-based ML services.
- Git: Solid understanding of version control, particularly in collaborative development environments.
- Comfortable working from runbooks and structured processes.
Desirable
- Exposure to AI/ML systems in production.
- Familiarity with monitoring and observability tools (Grafana, PowerBI, New Relic).
- Knowledge of MLOps tooling and data platforms (ML FLow, Databricks)
- Experience supporting customer-facing platforms.
- Knowledge of containerization (Kubernetes) is a plus.
- Experience of LLM Prompt Engineering and troubleshooting
- Early career in MLOps or ML Engineering.
- Someone who is eager to learn about complex predictive models.
- Background in computer science, informatics, or related fields
- Passion for Machine Learning and AI: An eager learner who is excited about working with cutting-edge ML technologies and is passionate about optimizing and maintaining ML models in production environments.
- Early Career in MLOps or ML Engineering: Ideally, Junior ML Engineer with a strong desire to grow in the field of MLOps and AI operations.
- A Collaborative Mindset: You thrive in a team setting and are ready to contribute to model improvement, A/B testing, and iterative development.
- Attention to Detail: A focus on model performance, bias prevention, and ensuring optimal model behavior as new data and models are introduced.
Additional information:
Nepal
- This role provides MLOps coverage from 07:45 – 15:45* NPT for US-based customers. You will be required to work during these hours and potentially outside of them if a model has issues.
- Rotational On-Call work will also be required.
Colombia
- This role provides MLOps coverage from 11am to 9pm* Colombia time for a US-based customer. You will be required to work on a shift rota to cover 8 hour time blocks during this time period and potentially outside of them if a model has issues.
- Rotational On-Call work will also be required.
*note that these hours are subject to change upon review.