-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a8e0df4
commit dc32b40
Showing
1 changed file
with
26 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,33 @@ | ||
# Search-Based-LDA | ||
This repository contains a few script (written in R) to automatically configure Latent Dirichlet Allocation (LDA) using meta-heuristics in a unsupervised fashion. In the current version, the scripts are customized for the problem of identify duplicated bug reports. | ||
This repository contains a few scripts (written in R) to automatically configure Latent Dirichlet Allocation (LDA) using meta-heuristics in an unsupervised fashion. In the current version, the hands are customized for the problem of identify duplicated bug reports. | ||
|
||
# Datasets | ||
To run the tool you need to download the projects from the Bench4BL dataset and publicly available in GitHub: https://github.com/exatoa/Bench4BL | ||
The projects used in the benchmark are available in the folder `datasets`. These projects come from the **Bench4BL** dataset and are publicly available in GitHub: `https://github.com/exatoa/Bench4BL`. | ||
|
||
Download an Archive and unpack it like described in their readme. Then move the contained folder `data/<group>/<archive_name>/bugrepo` into the folder `datasets`. | ||
You have to take an Archive which contains duplicate bug reports (array in `duplicates.json` is not empty), for example `SHL.tar`. | ||
Notice that projects available in the `datasets` folder are smaller than their original counterparts in **Bench4BL**. This is because we provide only the bug reports, the bugs, and the duplicate pairs (the oracle). If you also need to access the source code, please download the original projects from **Bench4BL**. | ||
|
||
# Run with Docker | ||
I warmly recommend using **Docker** as the scripts have been tested on a specific `R` version (v3.6.2). Besides, the `R` packages that I use require to fix some conflicts between dependencies and versions. | ||
|
||
To facilitate installing and running the script, please use the **Docker File** available in the `docker` folder. Follow these instructions: | ||
|
||
1. Open the terminal | ||
|
||
2. Build the docker image using the command: | ||
`docker build -t sbse-lda -f docker/Dockerfile .` | ||
|
||
3. Run the docker container using the command: | ||
`docker run -it -v <local-folder>:/home/SSBSE-LDA/Results sbse-lda:latest bash`. | ||
In the command above, replace `<local-folder>` with the folder in your local machine to save the results. The option `-v` maps a host directory to a directory in docker container. | ||
|
||
4. Once steps 1-2 are completed, you can run the `R` script inside the container using the commands: | ||
4.1 Start `R` with the command: `R` | ||
4.2 Run the main script inside `R`using the command: `source('MainScript.R')` | ||
4.3 Follow the instructions on the computer screen | ||
|
||
Notice that building the docker image will require some time (around 20+ minutes). | ||
|
||
# Publication | ||
A. Panichella, "A Systematic Comparison of Search Algorithms for Topic Modelling - A Study on Duplicate Bug Report Identification", The 11th Symposium on Search-Based Software Engineering (SSBSE), Tallinn Estonia, 31 Aug - 1 Sep, 2019. | ||
|
||
A. Panichella, "A Systematic Comparison of Search-Based Approaches for LDA Hyperparameter Tuning", Information and Software Technology, Under Review |