System integration is a complex process, but having near real-time data is crucial for making quick business decisions. This type of data processing can be efficiently handled using Kafka and Kafka Connect. With ready-to-use connectors, we can easily retrieve data from various databases and stream it directly into a data lake. One of the biggest advantages of this approach is support for Change Data Capture (CDC) technology, which tracks row modifications and facilitates data historization. During the session, we will set up a simple pipeline that transfers data from a SQL Server database directly to Azure Data Lake Storage with Kafka and Kafka Connect.
Agenda:
1. Introduction to the problem
2. Setup CDC on SQL Server
3. Short intro to Kafka and Kafka Connect
4. Setup Kafka and kafka connect with Docker Compose
5. Setup the connectors and run the pipeline