Data Engineering Example Project

Purpose of this project is developing data engineering skills showed below.

Batch Processing
Stream Processing

The diagram above shows an end-to-end Data Pipeline. Here are Data Producer, Kafka, Flink and Spark applications.

Requirements:

Write batch Spark application(s) with data in file system.
Write a Flink application using Kafka topics which is the DataProducer application writes data.
Develop the project using Scala.

Problem Details

Data

Data produces fromorders and products. Here are their schemas:

orders
 |-- customer_id: string
 |-- location:    string
 |-- seller_id:   string
 |-- order_date:  string
 |-- order_id:    string
 |-- price:       double
 |-- product_id:  string
 |-- status:      string

products
 |-- brandname:    string
 |-- categoryname: string
 |-- productid:    string
 |-- productname:  string

join keys are showed below:

orders.product_id = products.productid

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
flink-stream-processing-plugin		flink-stream-processing-plugin
images		images
raw-data		raw-data
spark-batch-processing-plugin		spark-batch-processing-plugin
.gitignore		.gitignore
DataProducer.jar		DataProducer.jar
Dockerfile		Dockerfile
docker-compose.yml		docker-compose.yml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Example Project

Problem Details

Data

About

Releases

Packages

Languages

Bozmenn/SparkFlinkProject

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Example Project

Problem Details

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages