Senior Data Engineer
Senior Data Engineer
Toronto, Ontario, Canada - Permanent
Job Description
Our client are on a mission to provide useful technological solutions that enrich and empower millions of people in their daily lives. They apply big data, artificial intelligence, and machine learning to bring the next generation of financial products and services to the Indian, Japanese and Canadian markets.
They have created the only fraud and risk management platform that orchestrates data from the entire customer journey, fighting fraud more effectively with configurable risk models in a single, easy-to-use platform.
If working with billions of events, petabytes of data, and optimizing for the last millisecond is something that excites you then read on! We are looking for Data Engineers who have seen their fair share of messy data sets and have been able to structure them for further fraud detection and prevention; anomaly detection and other AI products.
You will be working on writing frameworks for real-time and batch pipelines to ingest and transform events from 100’s of applications every day. These events will be consumed by both machines and people. Their ML and Software engineers consume these events to build new and optimize existing models to detect and fight new fraud patterns. You will also help optimize the feature pipelines for fast execution and work with software engineers to build event-driven microservices.
You will get to put cutting-edge tech in production and the freedom to experiment with new frameworks, try new ways to optimize, and resources to build the next big thing in fintech using data!
What does this include:
● Work directly with the Platform Engineering Team to create reusable experimental and production data pipelines and centralize the data store● Understand, tune, and master the processing engines used day-to-day
● Keep the data whole, safe, and flowing with expertise on high-volume data ingest and streaming platforms (like Spark Streaming, Kafka, etc.)
● Make the data available for online and offline consumption by machines and humans
● Maintain and optimize underlying storage systems to perform according to the set SLAs
● Sheppard and shape the data by developing efficient structures and schema for the data in storage and transit
● Explore as many new technology options for data processing, storage, and share them with the team
● Develop tools and contribute to open source wherever possible
● Adopt problem-solving as a way of life – always go to the root cause
Must Have Skills:
● Degree in Computer Science, Engineering, or a related field
● You have previously worked on building serious data pipelines ingesting and transforming > 10 ^6 events per minute and terabytes of data per day
● You are passionate about producing clean, maintainable, and testable code as part of a real-time data pipeline
● You understand how microservices work and are familiar with concepts of data modeling
● You can connect different services and processes together even if you have not worked with them before and follow the flow of data through various pipelines to debug data issues
● You have worked with Spark and Kafka before and have experimented with or heard about Flink/Spark Streaming/Kafka Streams and understand when to use one over the other
● You have experience implementing offline and online data processing flows and understand how to choose and optimize underlying storage technologies. You have worked or experimented with Cassandra/DynamoDB/Druid/Ignite/Presto/Athena
● On a bad day, maintaining a Zookeeper and bringing up a cluster doesn’t bother you
● You may not be a networking expert but you understand issues with ingesting data from applications in multiple data centers across geographies, on-premises, and cloud and will find a way to solve them
● Proficient in Java/Scala/Python/Spark