Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Remote/Telecommute JobREMOTE / Toronto, Ontario, Canada  - Permanent
This job allows you to work remotely 

Job Description

Our client is passionate about building software that solves real world challenges in the manufacturing industry. They depend on their SRE team to empower their users with a highly performant, highly available platform with a rich feature set to augment their own quality control processes. They're seeking an experienced SRE to help deliver excellence through our software development and software delivery processes. Specifically, they are searching for someone who brings fresh and creative ideas, demonstrates a unique and informed point of view, and enjoys collaboration with a cross-functional team to develop real-world solutions and deliver positive, measurable user experiences at every interaction.


• Manage our Production, Development and other environments and infrastructure through monitoring, automation, and other methods
• Create, manage, and operate CI/CD Pipelines across a suite of services and applications
• Drive improvements in reliability, quality, and time-to-market for our various platforms
• Measure and optimize system performance, continually contributing to innovation and improvement
• Provide primary operational support and engineering for multiple distributed software components
• Contribute to the improvement of our development, testing, and deployment processes

Must Have Skills:

• Bachelor’s degree in Computer Science or other highly technical, scientific, or engineering discipline
• Write code/program/hack (structured and OO) with one or more high-level languages such as Python, Java, Ruby, JavaScript, etc.,
• Ability to work in a fast-paced agile environment
• A passion for identifying reusable patterns and automating them
• An appropriate respect for Best Practices, Documentation, and Process
• Experience in a disciplined production environment with good knowledge of Azure cloud services
• Experience in deployment and developer workflows using Docker and Kubernetes
• Deployment, logging, monitoring, security, and automatic failover experience with container orchestration platforms on Azure, AWS, or GCP
• Experience in microservices architecture and service mesh
• Hands-on expertise in configuration management and infrastructure deployment tools like Terraform, etc.
• Detail-oriented with excellent analytical skills
• Flexibility to adjust to changing priorities, requirements, and schedules.

Daily and Monthly Responsibilities:

• Partner with development teams to improve CI/CD services with a focus on developer enablement and successful deployments
• Gather and analyze metrics to assist in performance tuning and fault finding
• Participate in system design consulting, platform management, and capacity planning discussions
• Crate sustainable systems and services through automation
• Balance feature development and delivery with reliability and user experience
• Participate in an on-call rotation to provide rapid response to critical issues in production


Starting: ASAP
Travel: 0%
Dress Code: Casual

Similar jobs in Toronto:

Similar jobs in other locations: