Senior Data Engineer
Senior Data Engineer
Toronto, Ontario, Canada - Permanent
This Senior Data Engineer role is one focused heavily on backend application development with a focus on designing pipelines, modeling data that is being piped, ensuring the timeliness and quality of data. In addition to the above, this individual will be expected to instantiate, observe and maintain infrastructure, both AWS managed and open source solutions.
- Work with our analytics, marketing and data science teams to understand our data processing needs.
- Be a key hands-on contributor to the design and implementation of our data platform solutions from the infrastructure layer up to the API.
- Model and architect our data in a way that will scale with the increasingly complex ways we’re analyzing it.
- Build robust pipelines that make sure data is where it needs to be, when it needs to be there.
- Build frameworks and tools to help our software engineers, data analysts, and data scientists design and build their own data pipelines in a self-service manner.
- Performance testing and engineering to ensure that our systems always scale to meet our needs.
- Be a key member of the team focused on pure hands-on contribution to the implementation and operation of our data platform.
- Design data models with a broader understanding of underlying systems.
- Identification of implementation of appropriate abstractions for immediate requirements.
- Build performant models that are consistent with accompanying documentation that are built with quality in mind.
- Consult with stakeholders on the best practices for creation and deployment of data models and data flows.
- Definition and enforcing of service level agreements between products owned and stakeholders, including configuration of monitoring and alerting
- Good understanding of Data lineage and dependencies between data pipelines.
- You are able to maintain existing ETLs and develop simple ETL processes from scratch. You are able to meet the requirements laid out for you. You start to see the bigger picture.
Working with Data Processing Frameworks:
- Capable of determining the best architecture, batch or streaming, for applications being built.
- Evaluation of various frameworks and documentation of pros/cons for a wide audience.
Working with Infrastructure:
- Proficient at provisioning new infrastructure across environments.
- Capable of managing/integration autoscaling, logging, monitoring and alerting for the system. Your infrastructure as code is environment agnostic.
Must Have Skills:
- You have at least 7 years of hands-on experience as an Engineer across multiple environments on complex distributed polyglot systems using Java, Scala, Clojure, Python, Go and/or C++.
- Strong SQL skills and data modeling experience.
- You can go up and down the stack from deep in the infrastructure layer all the way up to the client libraries.
- Deep understanding of object-oriented and/or functional programming patterns and paradigms.
- Hands on experience with multiple data platforms and tools (eg. S3, Redshift, Airflow, Spark, Presto, Hive).
- Minimum of 2-3 years of stakeholder management/enablement experience that cut across multiple teams.
- Passion for ensuring timeliness, availability and quality of our highest value data-sets that meets established SLOs.
- Ability to provide support for pieces of codebase owned and also understand the codebase with normal direction from peers and data engineers.
- Demonstrated experience with small teams that move fast - all members are expected to be able to achieve maximum results with minimal direction.
- Demonstrated experience measuring the impact of technical products across multiple domains through experimentation and statistical analysis.
- Strong ability to communicate on both business and technical subjects.
Nice to Have Skills:
- Kubernetes and/or Docker experience.
- Message driven or streaming architectures, such as those with Kafka, Spark, Flink.
- Postgres, MySql, or other RDBMS experience.
- AWS, GCP and/or Azure experience.
- Redshift, Presto, or other MPP database experience.
- Cassandra, Elastic, Redis and/or Couchbase experience.
- Airflow, Luigi, or other ETL scheduling tool experience.
- Open source contributions to a few major projects.