GitHub - jeff1191/flink-exercises: Flink Scala-Exercises (Basic Use)

Apache Flink - Exercises (Basic Use)

Batch (DataSet API)

Exercises

For the batch exercises I've used a csv concerning winners Oscar's films downloaded in this link https://cs.uwaterloo.ca/~s255khan/files/pictures.csv.

The csv modelling using a case class is the following:

case class Film (name: String, year: String, nominations: String, rating:String, duration: String, genre1: String, genre2: String, release: String, metacritic: String, synopsis: String)

Average of film´s nominations
Metacritic average, grouped by film´s genres.
Average duration of winning films by ages
How many winners films include at least their title at the synopsis?
How many winners films include all the words from the title?
Which is the standard deviation of the winners film´s rating in XXI century?

Streaming (DataStream API)

For these exercises I've used a websocket for the generation of events about meetup (http://meetup.com). The source captures people that are inscribed to events in real-time. Moreover the websocket is implemented using a RichFunction(API Flink)

Delete the bad formed objects
Number of users that have confirmed the event in the last 10 seconds.
Number of users users that have confirmed the event in the last 20 seconds updating each 5 seconds.
Number of users by country each 5 seconds.
Calculate Trending topics knowing that is a last minute information and updating the result each 10 seconds.

Documentation

All exercises have been implemented using v1.2.0 https://ci.apache.org/projects/flink/flink-docs-release-1.2/

The documentation of Apache Flink is located on the website: http://flink.apache.org

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Flink - Exercises (Basic Use)

Batch (DataSet API)

Exercises

Streaming (DataStream API)

Documentation

About

Releases

Packages

Languages

jeff1191/flink-exercises

Folders and files

Latest commit

History

Repository files navigation

Apache Flink - Exercises (Basic Use)

Batch (DataSet API)

Exercises

Streaming (DataStream API)

Documentation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages