For the batch exercises I've used a csv concerning winners Oscar's films downloaded in this link https://cs.uwaterloo.ca/~s255khan/files/pictures.csv.
The csv modelling using a case class is the following:
case class Film (name: String, year: String, nominations: String, rating:String, duration: String, genre1: String, genre2: String, release: String, metacritic: String, synopsis: String)
-
Average of film´s nominations
-
Metacritic average, grouped by film´s genres.
-
Average duration of winning films by ages
-
How many winners films include at least their title at the synopsis?
-
How many winners films include all the words from the title?
-
Which is the standard deviation of the winners film´s rating in XXI century?
For these exercises I've used a websocket for the generation of events about meetup (http://meetup.com). The source captures people that are inscribed to events in real-time. Moreover the websocket is implemented using a RichFunction(API Flink)
-
Delete the bad formed objects
-
Number of users that have confirmed the event in the last 10 seconds.
-
Number of users users that have confirmed the event in the last 20 seconds updating each 5 seconds.
-
Number of users by country each 5 seconds.
-
Calculate Trending topics knowing that is a last minute information and updating the result each 10 seconds.
All exercises have been implemented using v1.2.0 https://ci.apache.org/projects/flink/flink-docs-release-1.2/
The documentation of Apache Flink is located on the website: http://flink.apache.org