Toronto, Ontario, Canada - Permanent
Our client, an advanced R&D division of a global fintech firm, has recently launched an ML-powered fraud risk management (FRM) platform designed for fintechs, banks and eCommerce marketplaces with high transaction volumes. This product was developed from scratch at its innovation lab due to existing fraud management systems not being fast enough or robust enough to handle the hundreds of millions of decisions required each day by the parent company.
If working with billions of events, petabytes of data, and optimizing for the last millisecond is something that excites you then read on! Our client is looking for Data Engineers who have seen their fair share of messy data sets and have been able to structure them for building useful AI products.
You will be working on writing frameworks for real-time and batch pipelines to ingest and transform events from hundreds of applications every day. Our client's ML and Software engineers consume these for building data products like personalization and fraud detection. You will also help optimize the feature pipelines for fast execution and work with software engineers to build event-driven microservices.
You will get to put cutting-edge tech in production and freedom to experiment with new frameworks, try new ways to optimize, and resources to build the next big thing in fintech using data!
Responsibilities: Work directly with Machine Learning Engineers and Platform Engineering Team to create reusable experimental and production data pipelines
Understand, tune, and master the processing engines (like Spark, Hive, Samza, etc) used day-to-day
Keep the data whole, safe, and flowing with expertise on high-volume data ingest and streaming platforms (like Spark Streaming, Kafka, etc)
Sheppard and shape the data by developing efficient structures and schema for the data in storage and transit
Explore as many new technology options for data processing, and storage, and share them with the team
Develop tools and contribute to open source wherever possible
Adopt problem-solving as a way of life always go to the root cause
Must Have Skills:
Degree in Computer Science, Engineering, or a related field
Proficient in Java/Scala/Python/Spark
You have previously worked on building serious data pipelines ingesting and transforming > 10 ^6 events per minute and terabytes of data per day
You are passionate about producing clean, maintainable, and testable code as part of a real-time data pipeline
You understand how microservices work and are familiar with concepts of data modeling
You can connect different services and processes together even if you have not worked with them before and follow the flow of data through various pipelines to debug data issues
You have worked with Spark and Kafka before and have experimented with or heard about Flink/Druid/Ignite/Presto/Athena and understand when to use one over the other
You may not be a networking expert but you understand issues with ingesting data from applications in multiple data centers across geographies, on-premise, and cloud and will find a way to solve them
Flexibility to WFH
4th time certified company as a "Great Place to Work"
We ensure flexible hours outside of our core working hours
Enrolment in the Group Health Benefits plan right from day 1, no waiting period
Team building events
Fuel for the day: Weekly delivery of groceries, and all types of snacks
Daily fun in the office with competitive games of Ping Pong, Pool, multiple board games, and video games