Beyond Streaming

Integration of Kafka, KStreams, Spark, Schema Registry, Kafka-UI and many more

Target Audience

The main focus of this repo is to help everyone who are looking to learn KAFKA with Kstreams and Spark Streaming. We take away all the abstraction of setting up the resources and running different services. And we have provided with the use cases and examples which can be run individually and as well as a group.

This repo comprises multiple sections.

Kafka
- KStreams
- KsqlDB
- Kafka Connect
- Kafka Schema Registry
- Kafka REST
Spark Streaming
- Connect KStreams with Spark Streaming
- Connect Spark Streaming with KStreams with Avro Schema
KAFKA-UI to monitor manage Kafka easily

Architecture

Image Here

Setting up the Docker

To help you use these projects we will be bringing up below set of containers.

Sub Projects	Details
Zoo-Keeper	Zookeeper is used by Kafka brokers to determine which broker is the leader of a given partition and topic and perform leader elections
Broker	A Kafka cluster is a group of multiple Kafka brokers.
Schema Registry	Schema Registry provides a centralized repository for managing and validating schemas for topic message data, and for serialization and deserilazation of the data over the network
Rest Proxy	The Confluent REST Proxy provides a RESTful interface to an Apache Kafka® cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients.
Spark Master	Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
Spark worker-1
Spark worker-2
Kafka UI	UI for Apache Kafka is a free, open-source web UI to monitor and manage Apache Kafka clusters.
KsqlDB Server	ksqlDB is an event streaming database for Apache Kafka. It is distributed, scalable, reliable, and real-time.
KsqlDB CLI	ksqlDB CLI to one ksqlDB server per cluster.
Datagen	Generating data
Kafka Connect	Kafka Connect is a free, open-source component of Apache Kafka® that serves as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems
MINIO	inIO is a high-performance, S3 compatible object store. It is built for large scale AI/ML, data lake and database workloads. It runs on-prem and on any cloud (public or private) and from the data center to the edge.

Setting up properties for running the exercise code

For easy setup we have the fully ready docker-compose.yaml file which will bring up the necessary containers.

We have all the default configurations defined in a file:

src/main/resources/streams.properties

Running the exercises

Run the below script to start the containers

sh start_containers.sh

1. Kafka Kstreams

This section is cloned from learn-kafka-courses And we will only be working on submodule [kafka-streams](./gradlew runStreams -Pargs=)

We have extended the Above repo to have our specific requirements and added few extra modules.

2. Spark Streaming

We have multiple example and use cases with Spark Streaming.

Sub Projects
spark-python
spark-scala

3. KStreams with Schema Registry

Can be viewed from KAFKA UI. More details to follow.

4. Kafka REST API

List of Commands

5. Connect KStreams with Spark Streaming

We have multiple example and use cases with Spark Streaming.

Sub Projects
spark-scala-enrich-topic
spark-scala-functions

6. Connect Spark Streaming with KStreams

Sub Projects
spark-scala-enrich-topic
spark-scala-functions

7. KAFKA-UI

This project is to help Kafka developers to manage there clusters with UI. Using KAFKA-UI you can take advantage of all the maintenance and monitoring capabilities of Kafka using a click of a button.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
gradle/wrapper		gradle/wrapper
kafka-rest		kafka-rest
kafka-ui		kafka-ui
prometheus		prometheus
spark-python		spark-python
spark-scala-enrich-topic		spark-scala-enrich-topic
spark-scala-functions		spark-scala-functions
spark-scala-schema-registry		spark-scala-schema-registry
spark-scala		spark-scala
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
build.gradle		build.gradle
docker-compose.yml		docker-compose.yml
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle
start_containers.sh		start_containers.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Streaming

Integration of Kafka, KStreams, Spark, Schema Registry, Kafka-UI and many more

Target Audience

Architecture

Setting up the Docker

Setting up properties for running the exercise code

Running the exercises

1. Kafka Kstreams

2. Spark Streaming

3. KStreams with Schema Registry

4. Kafka REST API

5. Connect KStreams with Spark Streaming

6. Connect Spark Streaming with KStreams

7. KAFKA-UI

About

Releases

Packages

Languages

ajithshetty/spark-kafka-kstreams

Folders and files

Latest commit

History

Repository files navigation

Beyond Streaming

Integration of Kafka, KStreams, Spark, Schema Registry, Kafka-UI and many more

Target Audience

Architecture

Setting up the Docker

Setting up properties for running the exercise code

Running the exercises

1. Kafka Kstreams

2. Spark Streaming

3. KStreams with Schema Registry

4. Kafka REST API

5. Connect KStreams with Spark Streaming

6. Connect Spark Streaming with KStreams

7. KAFKA-UI

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages