GitHub - zerohhy/warp_jacobi: SCC13 graphlab task2

SCC13 graphlab task2 Boston Team GraphLab README

================

Input Datasets

LiveJournal: http://snap.stanford.edu/data/soc-LiveJournal1.txt.gz
Internet topology: http://snap.stanford.edu/data/as-skitter.txt.gz
Stiffness matrix:

http://www.cise.ufl.edu/research/sparse/MM/GHS_psdef/bmw7st_1.tar.gz
YouTube ground-truth communities:

http://snap.stanford.edu/data/bigdata/communities/com-youtube.ungraph.txt.gz
Amazon product co-purchasing network:

http://snap.stanford.edu/data/bigdata/communities/com-amazon.ungraph.txt.gz
Friendster:

http://snap.stanford.edu/data/bigdata/communities/com-friendster.ungraph.txt.gz

TASK 0: Ranking graph nodes on LiveJournal social network data

Goal: Use pagerank to compute network ranking on the LiveJournal data set
- Part 1: Install GraphLab as instructed in http://graphlab.org/download
- Use GraphLab's pagerank to compute network ranking
What to submit?
- Console output of the pagerank execution
- output files containing the pagerank vector
- The graph node with the highest pagerank value
Useful links:
- PageRank method: http://en.wikipedia.org/wiki/PageRank
- Network Ranking:
  
  http://docs.graphlab.org/graph_analytics.html#graph_analytics_pagerank
Tips:
- The LiveJournal dataset format is tsv, remember to set the correct file
  
  format in the command line
- Remember to save the results by setting the right command line flag

TASK 1: Spectral Clustering

Goal: Perform non-trivial analytics by utilizing GraphLab SVD and k-means
- Part 1a: run GraphLab SVD on an internet network for performing dimensionality
  
  reduction of the AS map. The result is the top 10 singular value triplets
- Part 1b: using the singular vectors of the matrix U, deploy GraphLab k-means
  
  clustering to compute clusters of Internet AS systems. The result is a
  
  grouping of related Internet AS systems together
- Part 2a: repeat Part 1a, but this time using the LiveJournal dataset
- Part 2b: repeat Part 1b, but this time using the LiveJournal dataset
- Part 3a: repeat Part 1a, but this time using the YouTube dataset
- Part 3b: repeat Part 1b, but this time using the YouTube dataset
- Part 4a: repeat Part 1a, but this time using the Amazon dataset
- Part 4b: repeat Part 1b, but this time using the Amazon dataset
What to submit?
- SVD singular value and error estimation for each of the singular values
  
  Create a file with the "Singular value ..." lines that appear in the
  
  output.
- k-means run output, including the cost per iteration
Tips:
- You need to supply rows and columns of the matrix (graph) size. Use
  
  --rows=1696416 --cols=1696417 for the AS dataset
- Use --save_vectors=1 to save SVD output vectors
- Since the network is binary (namely link weights are all one), use
  
  --binary=true flag
- You need to fine tune parameters nsv, nv, max_iter, and tol to get good
  
  accuracy. When the algorithm is finished you get an estimation of the accuracy
  
  for each singular value triplet. The nsv parameter is the number of required
  
  singular values (10 in this case). However, nv should be much bigger to allow
  
  for a good accuracy. You will need to play with different values of nv until you
  
  get the required accuracy (1e-12)
- For k-means, you first want to parse the SVD output and create one file
  
  containing the second column for each output file of the U vectors.
  
  These columns can then be merged into a single input file that will be
  
  used for k-means.
  
  If multiple MPI ranks are used, need to first to merge and sort the
  
  results.

TASK 2: GraphLab Programming

Goal: Implement the Jacobi algorithm using GraphLab's warp engine. The Jacobi

method is a simple parallel algorithm for solving a linear system of

equations.
- Part 1: Algorithm implementation
- Part 2: Algorithm verification on 3x3 matrix (mat3x3.txt)
- Part 3: Run Jacobi on the Stiffness matrix (bmw7st_1.mtx)
- Part 4: Run Jacobi on the Friendster data set
Implementation Requirements:
- The program should use GraphLab's warp engine.
- A residual calculation should be performed after each iteration. The
  
  residual is defined as the second norm squared (|| A*x - b||_2)^2,
  
  where x is the solution at the end of each iteration.
- You should support the following command line flags:
  - Number of matrix rows: --rows
  - Number of matrix columns: --cols
  - Max. number of iterations: --max_iter
  - Tolerance threshold: --tol
Tips:
- It is advised to follow the warp engine tutorial listed in 'Useful links'
  
  during implementation
- The easiest format to work with is the matrix market format
- Note that the header for the original matrix file must be removed to
  
  allow you to use GraphLab's graph_loader().
- You must scale the input matrix, such that the main diagonal will be equal
  
  to the absolute sum of the row entries +1. The vector b should be all
  
  ones. See Diagonally dominant matrix in 'Useful links'
- For an example of how to use clopts, see the svd.cpp source code in the
  
  collaborative_filtering toolkit
- The Stiffness matrix is symmetric. You will need to create a GraphLab
  
  graph out of the sparse symmetric matrix
- You can use IS_POD_TYPE super class to avoid implementing the save() and
  
  load() operations which are required when transmitting the vertex data
  
  over the network. For example:
```
  struct vertex_data : IS_POD_TYPE { 

      int val1; 

      int val2; 

      vertex_data(int val1, int val2) : val1(val1), val2(val2) { }; 

  }



  typedef vertex_data vertex_data_type;
```
- Don't forget to edit the CMakeLists.txt file and add your application. See
  
  toolkits/linear_solvers/CMakeLists.txt
- The Jacobi method starts with a random starting vector. It is advised to
  
  initialize it using the drand48() function. Be careful not to set it to zero!

RECOMMENDATIONS:

- The Application Specialist used GraphLab commitID 734e9b6f9b to compute

  the results.



- Task 0 and Task 2 can be completed with the GraphLab v2.2 release



- It is strongly recommended to use GraphLab commitID 734e9b6f9b for Task

  1, as it contains changes that support binary graphs, and displays cost

  metrics to measure progress and performance.

  The source for that commitID is available in graphlab-734e9b6f9b/ but

  you are welcome to download directly from GitHub



  It is possible to use GraphLab v2.2 for Task 1, but not recommended.



- A template file has been provided for warp_jacobi_template.cpp to help

  you get started, but not that you are NOT REQUIRED to use it.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
warp_jacobi.cpp		warp_jacobi.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

zerohhy/warp_jacobi

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages