For the batch exercises I've used a csv concerning winners Oscar's films downloaded in this link
The csv modelling using a case class is the following:
case class Film (name: String, year: String, nominations: String, rating:String, duration: String, genre1: String, genre2: String, release: String, metacritic: String, synopsis: String)
Average of film´s nominations
Metacritic average, grouped by film´s genres.
Average duration of winning films by ages
How many winners films include at least their title at the synopsis?
How many winners films include all the words from the title?
Which is the standard deviation of the winners film´s rating in XXI century?
For these exercises I've used a websocket for the generation of events about meetup ( The source captures people that are inscribed to events in real-time. Moreover the websocket is implemented using a RichFunction(API Flink)
Delete the bad formed objects
Number of users that have confirmed the event in the last 10 seconds.
Number of users users that have confirmed the event in the last 20 seconds updating each 5 seconds.
Number of users by country each 5 seconds.
Calculate Trending topics knowing that is a last minute information and updating the result each 10 seconds.
All exercises have been implemented using v1.2.0
The documentation of Apache Flink is located on the website: