Skip to content

Applying PageRank algorithm on the dblp dataset using spark

Notifications You must be signed in to change notification settings

parth-code/SparkPageRank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PageRank on the DBLP dataset using Spark

Installation requirements:

  1. Spark 2.4.x

  2. Hadoop 3.x.x

  3. Java 8

  4. Scala 2.1x.x

  5. Sbt version 2.1x.x

I installed hadoop and spark on windows and thus, did not need cloudera or hortonworks for executing my jars.

The algorithm is modified from the implementation at https://github.com/abbas-taher/pagerank-example-spark2.0-deep-dive/blob/master/SparkPageRank.scala

How to run:

1)Get the dblp.xml and dblp.dtd file from https://dblp.uni-trier.de/. Place both in same folder.

2)Go to the project folder and run the following: sbt clean compile

3)Method 1: Without jar

1) `sbt run <input-file-location> <output-file-location>` in terminal(cmd or in Intellij). 

Use absolute paths and '\\' instead of '\'(Windows). 

If running using Intellij, add the argument -Xmx6000m in Run-> Edit Configuration-> VM Options

This increases memory allocation to the VM.

Method 2: Using jar

1) Create the fat jar using
  ~sbt assembly

2) use
  `spark-submit --class prtest pagerank.jar <input-file-location> <output-file-location>`
  1. The output folder will have files called part-xxxxx. Open as a text file. This is the required result.

Tests can be run using sbt clean compile test

Output Format:

The output generated will be of the form

(University of Paris-Sud, Orsay, France,0.8154981739701659)

(Elena,0.9850243302878131)

(John Bell,1.4621033282930214)

(University of Nice Sophia Antipolis, France,0.5678480111313514)

(Joseph Fourier University, Grenoble, France,0.8154981739701659)

(Elena Zheleva,1.3690036520596678)

(Acta Inf.,0.9850243302878131)

About

Applying PageRank algorithm on the dblp dataset using spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages