Skip to content

zerohhy/warp_jacobi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

SCC13 graphlab task2 Boston Team GraphLab README

================

Input Datasets


TASK 0: Ranking graph nodes on LiveJournal social network data


  • Goal: Use pagerank to compute network ranking on the LiveJournal data set

  • What to submit?

    • Console output of the pagerank execution

    • output files containing the pagerank vector

    • The graph node with the highest pagerank value

  • Useful links:

  • Tips:

    • The LiveJournal dataset format is tsv, remember to set the correct file

      format in the command line

    • Remember to save the results by setting the right command line flag

TASK 1: Spectral Clustering


  • Goal: Perform non-trivial analytics by utilizing GraphLab SVD and k-means

    • Part 1a: run GraphLab SVD on an internet network for performing dimensionality

      reduction of the AS map. The result is the top 10 singular value triplets

    • Part 1b: using the singular vectors of the matrix U, deploy GraphLab k-means

      clustering to compute clusters of Internet AS systems. The result is a

      grouping of related Internet AS systems together

    • Part 2a: repeat Part 1a, but this time using the LiveJournal dataset

    • Part 2b: repeat Part 1b, but this time using the LiveJournal dataset

    • Part 3a: repeat Part 1a, but this time using the YouTube dataset

    • Part 3b: repeat Part 1b, but this time using the YouTube dataset

    • Part 4a: repeat Part 1a, but this time using the Amazon dataset

    • Part 4b: repeat Part 1b, but this time using the Amazon dataset

  • What to submit?

    • SVD singular value and error estimation for each of the singular values

      Create a file with the "Singular value ..." lines that appear in the

      output.

    • k-means run output, including the cost per iteration

  • Tips:

    • You need to supply rows and columns of the matrix (graph) size. Use

      --rows=1696416 --cols=1696417 for the AS dataset

    • Use --save_vectors=1 to save SVD output vectors

    • Since the network is binary (namely link weights are all one), use

      --binary=true flag

    • You need to fine tune parameters nsv, nv, max_iter, and tol to get good

      accuracy. When the algorithm is finished you get an estimation of the accuracy

      for each singular value triplet. The nsv parameter is the number of required

      singular values (10 in this case). However, nv should be much bigger to allow

      for a good accuracy. You will need to play with different values of nv until you

      get the required accuracy (1e-12)

    • For k-means, you first want to parse the SVD output and create one file

      containing the second column for each output file of the U vectors.

      These columns can then be merged into a single input file that will be

      used for k-means.

      If multiple MPI ranks are used, need to first to merge and sort the

      results.

TASK 2: GraphLab Programming


  • Goal: Implement the Jacobi algorithm using GraphLab's warp engine. The Jacobi

    method is a simple parallel algorithm for solving a linear system of

    equations.

    • Part 1: Algorithm implementation

    • Part 2: Algorithm verification on 3x3 matrix (mat3x3.txt)

    • Part 3: Run Jacobi on the Stiffness matrix (bmw7st_1.mtx)

    • Part 4: Run Jacobi on the Friendster data set

  • Implementation Requirements:

    • The program should use GraphLab's warp engine.

    • A residual calculation should be performed after each iteration. The

      residual is defined as the second norm squared (|| A*x - b||_2)^2,

      where x is the solution at the end of each iteration.

    • You should support the following command line flags:

      • Number of matrix rows: --rows

      • Number of matrix columns: --cols

      • Max. number of iterations: --max_iter

      • Tolerance threshold: --tol

  • Tips:

    • It is advised to follow the warp engine tutorial listed in 'Useful links'

      during implementation

    • The easiest format to work with is the matrix market format

    • Note that the header for the original matrix file must be removed to

      allow you to use GraphLab's graph_loader().

    • You must scale the input matrix, such that the main diagonal will be equal

      to the absolute sum of the row entries +1. The vector b should be all

      ones. See Diagonally dominant matrix in 'Useful links'

    • For an example of how to use clopts, see the svd.cpp source code in the

      collaborative_filtering toolkit

    • The Stiffness matrix is symmetric. You will need to create a GraphLab

      graph out of the sparse symmetric matrix

    • You can use IS_POD_TYPE super class to avoid implementing the save() and

      load() operations which are required when transmitting the vertex data

      over the network. For example:

        struct vertex_data : IS_POD_TYPE { 
      
            int val1; 
      
            int val2; 
      
            vertex_data(int val1, int val2) : val1(val1), val2(val2) { }; 
      
        }
      
      
      
        typedef vertex_data vertex_data_type;
      
    • Don't forget to edit the CMakeLists.txt file and add your application. See

      toolkits/linear_solvers/CMakeLists.txt

    • The Jacobi method starts with a random starting vector. It is advised to

      initialize it using the drand48() function. Be careful not to set it to zero!

RECOMMENDATIONS:

- The Application Specialist used GraphLab commitID 734e9b6f9b to compute

  the results.



- Task 0 and Task 2 can be completed with the GraphLab v2.2 release



- It is strongly recommended to use GraphLab commitID 734e9b6f9b for Task

  1, as it contains changes that support binary graphs, and displays cost

  metrics to measure progress and performance.

  The source for that commitID is available in graphlab-734e9b6f9b/ but

  you are welcome to download directly from GitHub



  It is possible to use GraphLab v2.2 for Task 1, but not recommended.



- A template file has been provided for warp_jacobi_template.cpp to help

  you get started, but not that you are NOT REQUIRED to use it.

About

SCC13 graphlab task2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published