Site Reliability Engineer (SRE)
Site Reliability Engineer (SRE)
Toronto, Ontario, Canada - Permanent
The SRE role will support Development Operations and Production activities for The Company. You will be a core member of the Production Support team working with the Product, Engineering, and Marketing teams to deliver our Customer Application. The SRE will report to the DevOps leader, and has direct input into the shaping of the Infrastructure architecture and strategy and is responsible for understanding and delivering the vision as defined and agreed by the engineering and executive team.
Specifically, the SRE has direct input into the shaping of the new product service offerings and is responsible for understanding and delivering to market, the vision, as defined and agreed by the Product, Engineering teams and executive.
You will work collaboratively and closely with the engineering team members during planning and review meetings, and the daily scrum to manage the roadmap and feature prioritization and development. You will be responsible for the Dev to Prod delivery and Operations support.
This position will work closely with team members in engineering, user experience, marketing, sales, and customer support. To be successful, the SRE must be an expert in their domain and possess excellent communication and development skills to understand, document, develop, and implement key features and functionality.
What You Will Do Collaborate
Collaborate closely with Product, Marketing and Support to define functional requirements, develop the product vision, strategy, and roadmap, and to drive adoption and usage
Clearly communicate task estimates, ETAs, and work breakdown structure to management on Atlassian Jira and client side
Support users by developing documentation and assistance tools
Keep colleagues informed of developments, work collaboratively
Prepare and participate in roadmap planning, Sprint planning and reviews, retrospectives, and daily stand-ups
Evangelize and defend the vision of the product with the development team on a day-to-day basis to ensure customer satisfaction and acceptance
Assist Scrum Master to drive the Agile software development process and team to deliver software meeting business requirements
Assist with tracking and reporting team velocity and Sprint capacity planning Duties and responsibilities
Work with CTO, and team at large, to define the proper product architecture on AWS based on Current system and future business goals.
Build and maintain highly available development, test and production systems.
Design and configure continuous integration and deployment systems
Build out development, test and production environments as appropriate
Create and enhance application and infrastructure monitoring
Develop cost optimization strategies
Assist in the architecture, design, implementation, and lead AWS public cloud build (connectivity, network, security, containerization, monitoring)
Automate infrastructure provisioning to stand-up servers, install software and applications
Manage storage, compute efficiency, and optimization activities, including evaluating the configuration of compute size, storage solutions, and other services (network services, automation, and load balancing)
Work on DevOps processes and tools supporting agile application development teams leading to continuous integration, test, and deploy methods.
Assist with application integration and troubleshooting in this infrastructure for a complex application environment, including management of dependencies on services, platforms, and other applications within the cloud infrastructure.
Create DevOps process automation and tooling to implement standards and boundaries in a way that empowers our application development teams to help themselves for their infrastructure and deployment needs.
Build independent web-based tools, microservices and solutions
Write scripts and automation using Perl/Python/Groovy/Java/Bash
Configure and manage data sources like MySQL, Neo4J, Mongo, Elasticsearch, Redis, etc.
Must Have Skills:
Experience in a production environment
Deep knowledge of AWS
Knowledge of deployment and developer workflow using Docker and Kubernetes (Helm charts nice to have)
Strong experience in Linux environments, especially networking technologies such as iptables and OpenVPN
Deployment, logging, monitoring, security and automatic failover experience with container orchestration platforms on AWS
In-depth knowledge of security best-practices, policy, access management and cryptography
5 years of experience in managing Linux based infrastructure
5 years of hands-on experience at least in one scripting language
5 years of hands-on experience with databases including MySQL, & Elasticsearch
Hands-on expertise in configuration management and deployment tools like Puppet, Chef, Ansible, etc.
Terraform with infrastructure as code nice to have
Sense of ownership and pride in your performance and its impact on companys success
Critical thinker and problem-solving skills
Lead, coach, task manage, technical plan, and mentor team members and more junior engineers
Working with remote client resources and development activities
Work with cross-functional and geographically distributed teams
Define and implement performance improvement strategies
Train and mentor development teams in leading technologies