The main focus of this repo is to help everyone who are looking to learn KAFKA with Kstreams and Spark Streaming. We take away all the abstraction of setting up the resources and running different services. And we have provided with the use cases and examples which can be run individually and as well as a group.
This repo comprises multiple sections.
- Kafka
- KStreams
- KsqlDB
- Kafka Connect
- Kafka Schema Registry
- Kafka REST
- Spark Streaming
- Connect KStreams with Spark Streaming
- Connect Spark Streaming with KStreams with Avro Schema
- KAFKA-UI to monitor manage Kafka easily
Image Here
To help you use these projects we will be bringing up below set of containers.
Sub Projects | Details |
---|---|
Zoo-Keeper | Zookeeper is used by Kafka brokers to determine which broker is the leader of a given partition and topic and perform leader elections |
Broker | A Kafka cluster is a group of multiple Kafka brokers. |
Schema Registry | Schema Registry provides a centralized repository for managing and validating schemas for topic message data, and for serialization and deserilazation of the data over the network |
Rest Proxy | The Confluent REST Proxy provides a RESTful interface to an Apache Kafka® cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. |
Spark Master | Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. |
Spark worker-1 | |
Spark worker-2 | |
Kafka UI | UI for Apache Kafka is a free, open-source web UI to monitor and manage Apache Kafka clusters. |
KsqlDB Server | ksqlDB is an event streaming database for Apache Kafka. It is distributed, scalable, reliable, and real-time. |
KsqlDB CLI | ksqlDB CLI to one ksqlDB server per cluster. |
Datagen | Generating data |
Kafka Connect | Kafka Connect is a free, open-source component of Apache Kafka® that serves as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems |
MINIO | inIO is a high-performance, S3 compatible object store. It is built for large scale AI/ML, data lake and database workloads. It runs on-prem and on any cloud (public or private) and from the data center to the edge. |
For easy setup we have the fully ready docker-compose.yaml file which will bring up the necessary containers.
We have all the default configurations defined in a file:
src/main/resources/streams.properties
Run the below script to start the containers
sh start_containers.sh
This section is cloned from learn-kafka-courses And we will only be working on submodule [kafka-streams](./gradlew runStreams -Pargs=)
We have extended the Above repo to have our specific requirements and added few extra modules.
We have multiple example and use cases with Spark Streaming.
Sub Projects |
---|
spark-python |
spark-scala |
Can be viewed from KAFKA UI. More details to follow.
We have multiple example and use cases with Spark Streaming.
Sub Projects |
---|
spark-scala-enrich-topic |
spark-scala-functions |
Sub Projects |
---|
spark-scala-enrich-topic |
spark-scala-functions |
This project is to help Kafka developers to manage there clusters with UI. Using KAFKA-UI you can take advantage of all the maintenance and monitoring capabilities of Kafka using a click of a button.