-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a4ae6d1
commit b6526e6
Showing
1 changed file
with
21 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,37 @@ | ||
# parallpairs | ||
# Parallel all-pairs similarity search algorithms in OCaml | ||
|
||
## Parallel all-pairs similarity search algorithms in OCaml | ||
##Sources | ||
|
||
The repository contains the 1.0 sources, a release will be made soon. | ||
|
||
##Citation | ||
|
||
If you use this code, please cite the following paper. It is currently under review at IJPP. | ||
|
||
https://arxiv.org/abs/1402.3010 | ||
|
||
You may cite it as: | ||
Title: 1-D and 2-D Parallel Algorithms for All-Pairs Similarity Problem | ||
|
||
Eray Özkural, Cevdet Aykanat: 1-D and 2-D Parallel Algorithms for All-Pairs Similarity Problem. CoRR abs/1402.3010 (2014) | ||
http://dblp.org/rec/html/journals/corr/OzkuralA14 | ||
Authors: Eray Özkural, Cevdet Aykanat (Submitted on 13 Feb 2014) | ||
|
||
1-D and 2-D Parallel Algorithms for All-Pairs Similarity Problem | ||
Abstract: All-pairs similarity problem asks to find all vector pairs in a set of vectors the similarities of which surpass a given similarity threshold, and it is a computational kernel in data mining and information retrieval for several tasks. We investigate the parallelization of a recent fast sequential algorithm. We propose effective 1-D and 2-D data distribution strategies that preserve the essential optimizations in the fast algorithm. 1-D parallel algorithms distribute either dimensions or vectors, whereas the 2-D parallel algorithm distributes data both ways. Additional contributions to the 1-D vertical distribution include a local pruning strategy to reduce the number of candidates, a recursive pruning algorithm, and block processing to reduce imbalance. The parallel algorithms were programmed in OCaml which affords much convenience. Our experiments indicate that the performance depends on the dataset, therefore a variety of parallelizations is useful. | ||
|
||
Eray Özkural, Cevdet Aykanat | ||
(Submitted on 13 Feb 2014) | ||
The paper is included in the sources. | ||
|
||
All-pairs similarity problem asks to find all vector pairs in a set of vectors the similarities of which surpass a given similarity threshold, and it is a computational kernel in data mining and information retrieval for several tasks. We investigate the parallelization of a recent fast sequential algorithm. We propose effective 1-D and 2-D data distribution strategies that preserve the essential optimizations in the fast algorithm. 1-D parallel algorithms distribute either dimensions or vectors, whereas the 2-D parallel algorithm distributes data both ways. Additional contributions to the 1-D vertical distribution include a local pruning strategy to reduce the number of candidates, a recursive pruning algorithm, and block processing to reduce imbalance. The parallel algorithms were programmed in OCaml which affords much convenience. Our experiments indicate that the performance depends on the dataset, therefore a variety of parallelizations is useful. | ||
##Comments | ||
|
||
The code is quite interesting, as it shows how to effectively use OCaml for MPI code. There is a bunch of well-written parallel functional code that I will extract from this codebase and release separately. You need the latest ocamlmpi release as that contains the patches I made to make this code work. | ||
|
||
The code is quite interesting, as it shows how to effectively use OCaml for MPI programming. There is a bunch of well-written parallel functional code that I will extract from this codebase and release separately. You need the latest ocamlmpi release as that contains the patches I made to make this code work. | ||
##Datasets | ||
|
||
You can download the datasets from: https://github.com/examachine/data/tree/master/dv | ||
|
||
##License | ||
|
||
This code is released under AGPL-3.0. Please do not ask me to release it under BSD license. If you need a commercial license, you should purchase it. | ||
|
||
Happy hacking! | ||
|
||
|
||
Eray Ozkural, PhD | ||
Eray Özkural, PhD | ||
Founder, Gök Us Sibernetik Ar&Ge Ltd. Şti. |