ML Ops
Montreal, Quebec, Canada - Permanent
Job Description
Are you a systems-minded professional who thrives at the intersection of machine learning and infrastructure?
We're looking for an MLOps Engineer to design, build, and operate scalable machine learning pipelines and deployment workflows. You'll play a key role in enabling fast, reliable, and automated releases of ML-based services, helping our teams move from experimentation to production with confidence and efficiency.The platform you’ll be contributing to empowers software engineers by automating tasks like code review, issue detection, and quality feedback for enabling dev teams to ship better software, faster.
What You’ll Do:
End-to-End LLM Workflows
-Prototype, harden, and productize LLM-centric features with the ML team
-Embed automated tests and validation at each stage
Data-Ops Tooling
-Create and iterate internal tools for dataset annotation, dynamic expansion, and prompt evaluation
-Integrate data triggers into CI/CD for re-evaluation workflows
Prompt/Component Versioning & Drift Detection
-Define robust schemas for tracking prompt revisions and model artifacts
-Deploy automated checks to surface deviations in output distributions
Observability & Incident Management
-Maintain on-call readiness: codify runbooks, escalation paths, and post-mortems
-Refine dashboards and playbooks to reduce detection and recovery times
CI/CD Automation
-Build and manage pipelines (e.g., GitHub Actions) to automate testing, deployment, and releases
-Support zero-downtime rollouts and validation workflows
Infrastructure as Code
-Declare, version, and manage infrastructure with tools like Terraform
-Ensure modular, peer-reviewed, and auditable infrastructure practices
Real-Time Monitoring & SLA Compliance
-Instrument systems with heartbeats, synthetic checks, and log-based metrics
-Set up dashboards and alerts using modern monitoring tools to meet SLA targets
Collaboration & Knowledge Sharing
-Co-author internal documentation and share insights at team reviews
-Promote engineering standards and operational excellence across teams
Must Have Skills:
-3+ years in MLOps or DevOps roles focused on ML workloads.
-Proven experience with LLM prompting and LLM tools like LlamaIndex.
-Strong cross-functional collaboration & best-practice documentation
-Familiarity with monitoring/alerting tools and ML governance practices.