Mining of Massive Datasets

The implementation of data mining algorithms

Description:

Assignments in this repository are all about the implementation of algorithm to mine massive data under python and spark.

Assignments	Topic	Framework	Related Algorithm
Assignment 1	Yelp Rating Counting	Spark	Spark Operations(Transformation, Action)
Assignment 2	Finding Frequent Item Set	Spark	A-priori, SON
Assignment 3	Collaborating Filtering	Spark	LSH, User/Item-based CF
Assignment 4	Community Detection	Spark	Girvan-Newman
Assignment 5	Clustering	Spark	BFR algorithm
Assignment 6	Streaming Data Algorithms	Spark	Bloom Filtering, Flajolet Martin, Fixed Size Sampling

Python 3.6 and Spark 2.3.2.

You can find assignments description and source code in corressponding assighment folder

Data set is called Yelp Datasets Challenge, and came from the website Yelp Datasets

Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman: [Mining of Massive Datasets]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Clustering		Clustering
Community_Dectection		Community_Dectection
Recommendation_System		Recommendation_System
SON_Algorithm		SON_Algorithm
Spark_Operation		Spark_Operation
Streaming_Data_Algorithm		Streaming_Data_Algorithm
.DS_Store		.DS_Store
Mining of Massive Datasets.pdf		Mining of Massive Datasets.pdf
README.md		README.md
大数据互联网大规模数据挖掘与分布式处理第2版.pdf		大数据互联网大规模数据挖掘与分布式处理第2版.pdf