Dean Morin

Funding Circle

Talk abstract: One-by-one Is No Fun: Lessons learned writing Kafka ETL jobs

I’ve been writing ETL jobs using Kafka for a couple of years now. In that time, I’ve done just about everything wrong, before figuring out what does work. This talk will cover:

-What Kafka is

-What the major frameworks are, and how they steer you towards one-by-one message processing

-Why you shouldn’t do that, including performance measurements for different methods of loading data into a Postgres data warehouse

-How to avoid on-by-one processing

Bio: Dean is a Data Engineer originally from Vancouver, now living in San Francisco. His general fussiness and paranoia make him suited to the hairball which is the data world. He’s especially interested in writing metadata-driven ETL systems. He uses much of his spare time to rock climb, and is planning to sneak away to the Peak District while in the UK, so if you have any good info on the area, track him down!

 

Saturday April 29th , 2017
Auditorium
9:00 am-
6:00 pm
Data Science Festival Mainstage (Ballot ticket only) Please register for a ballot ticket here: https://www.eventbrite.co.uk/e/ballot-ticket-data-science-festival-mainstage-day-tickets-32356469070 Due to the popularity of Data Science Festival events, we are now allocating event tickets via a random ballot. Registering here enters you into the ticket ballot for the Data Science Festival Mainstage day on…