Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Toronto, Ontario, Canada  - Permanent

Job Description

The SRE role will support Development Operations and Production activities for The Company. You will be a core member of the Production Support team working with the Product, Engineering, and Marketing teams to deliver our Customer Application. The SRE will report to the DevOps leader, and has direct input into the shaping of the Infrastructure architecture and strategy and is responsible for understanding and delivering the vision as defined and agreed by the engineering and executive team.

Specifically, the SRE has direct input into the shaping of the new product service offerings and is responsible for understanding and delivering to market, the vision, as defined and agreed by the Product, Engineering teams and executive.

You will work collaboratively and closely with the engineering team members during planning and review meetings, and the daily scrum to manage the roadmap and feature prioritization and development. You will be responsible for the Dev to Prod delivery and Operations support.

This position will work closely with team members in engineering, user experience, marketing, sales, and customer support. To be successful, the SRE must be an expert in their domain and possess excellent communication and development skills to understand, document, develop, and implement key features and functionality.

What You Will Do Collaborate
• Collaborate closely with Product, Marketing and Support to define functional requirements, develop the product vision, strategy, and roadmap, and to drive adoption and usage
• Clearly communicate task estimates, ETAs, and work breakdown structure to management on Atlassian Jira and client side
• Support users by developing documentation and assistance tools
• Keep colleagues informed of developments, work collaboratively

• Prepare and participate in roadmap planning, Sprint planning and reviews, retrospectives, and daily stand-ups
• Evangelize and defend the vision of the product with the development team on a day-to-day basis to ensure customer satisfaction and acceptance
• Assist Scrum Master to drive the Agile software development process and team to deliver software meeting business requirements
• Assist with tracking and reporting team velocity and Sprint capacity planning Duties and responsibilities
• Work with CTO, and team at large, to define the proper product architecture on AWS based on Current system and future business goals.
• Build and maintain highly available development, test and production systems.
• Design and configure continuous integration and deployment systems
• Build out development, test and production environments as appropriate
• Create and enhance application and infrastructure monitoring
• Develop cost optimization strategies
• Assist in the architecture, design, implementation, and lead AWS public cloud build (connectivity, network, security, containerization, monitoring)
• Automate infrastructure provisioning to stand-up servers, install software and applications
• Manage storage, compute efficiency, and optimization activities, including evaluating the configuration of compute size, storage solutions, and other services (network services, automation, and load balancing)
• Work on DevOps processes and tools supporting agile application development teams leading to continuous integration, test, and deploy methods.
• Assist with application integration and troubleshooting in this infrastructure for a complex application environment, including management of dependencies on services, platforms, and other applications within the cloud infrastructure.
• Create DevOps process automation and tooling to implement standards and boundaries in a way that empowers our application development teams to help themselves for their infrastructure and deployment needs.
• Build independent web-based tools, microservices and solutions
• Write scripts and automation using Perl/Python/Groovy/Java/Bash
• Configure and manage data sources like MySQL, Neo4J, Mongo, Elasticsearch, Redis, etc.
• Production support

Must Have Skills:

Required Skills
• Experience in a production environment
• Deep knowledge of AWS
• Knowledge of deployment and developer workflow using Docker and Kubernetes (Helm charts nice to have)
• Strong experience in Linux environments, especially networking technologies such as iptables and OpenVPN
• Deployment, logging, monitoring, security and automatic failover experience with container orchestration platforms on AWS
• In-depth knowledge of security best-practices, policy, access management and cryptography
• 5 years of experience in managing Linux based infrastructure
• 5 years of hands-on experience at least in one scripting language
• 5 years of hands-on experience with databases including MySQL, & Elasticsearch
• Hands-on expertise in configuration management and deployment tools like Puppet, Chef, Ansible, etc.
• Terraform with infrastructure as code nice to have
• Sense of ownership and pride in your performance and its impact on company’s success
• Critical thinker and problem-solving skills
• Team player

Technical Leadership
• Lead, coach, task manage, technical plan, and mentor team members and more junior engineers
• Working with remote client resources and development activities
• Work with cross-functional and geographically distributed teams
• Define and implement performance improvement strategies
• Train and mentor development teams in leading technologies


Starting: ASAP

Similar jobs in Toronto:

Similar jobs in other locations: