Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
apanichella authored Sep 6, 2020
1 parent a8e0df4 commit dc32b40
Showing 1 changed file with 26 additions and 4 deletions.
30 changes: 26 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,33 @@
# Search-Based-LDA
This repository contains a few script (written in R) to automatically configure Latent Dirichlet Allocation (LDA) using meta-heuristics in a unsupervised fashion. In the current version, the scripts are customized for the problem of identify duplicated bug reports.
This repository contains a few scripts (written in R) to automatically configure Latent Dirichlet Allocation (LDA) using meta-heuristics in an unsupervised fashion. In the current version, the hands are customized for the problem of identify duplicated bug reports.

# Datasets
To run the tool you need to download the projects from the Bench4BL dataset and publicly available in GitHub: https://github.com/exatoa/Bench4BL
The projects used in the benchmark are available in the folder `datasets`. These projects come from the **Bench4BL** dataset and are publicly available in GitHub: `https://github.com/exatoa/Bench4BL`.

Download an Archive and unpack it like described in their readme. Then move the contained folder `data/<group>/<archive_name>/bugrepo` into the folder `datasets`.
You have to take an Archive which contains duplicate bug reports (array in `duplicates.json` is not empty), for example `SHL.tar`.
Notice that projects available in the `datasets` folder are smaller than their original counterparts in **Bench4BL**. This is because we provide only the bug reports, the bugs, and the duplicate pairs (the oracle). If you also need to access the source code, please download the original projects from **Bench4BL**.

# Run with Docker
I warmly recommend using **Docker** as the scripts have been tested on a specific `R` version (v3.6.2). Besides, the `R` packages that I use require to fix some conflicts between dependencies and versions.

To facilitate installing and running the script, please use the **Docker File** available in the `docker` folder. Follow these instructions:

1. Open the terminal

2. Build the docker image using the command:
`docker build -t sbse-lda -f docker/Dockerfile .`

3. Run the docker container using the command:
`docker run -it -v <local-folder>:/home/SSBSE-LDA/Results sbse-lda:latest bash`.
In the command above, replace `<local-folder>` with the folder in your local machine to save the results. The option `-v` maps a host directory to a directory in docker container.

4. Once steps 1-2 are completed, you can run the `R` script inside the container using the commands:
4.1 Start `R` with the command: `R`
4.2 Run the main script inside `R`using the command: `source('MainScript.R')`
4.3 Follow the instructions on the computer screen

Notice that building the docker image will require some time (around 20+ minutes).

# Publication
A. Panichella, "A Systematic Comparison of Search Algorithms for Topic Modelling - A Study on Duplicate Bug Report Identification", The 11th Symposium on Search-Based Software Engineering (SSBSE), Tallinn Estonia, 31 Aug - 1 Sep, 2019.

A. Panichella, "A Systematic Comparison of Search-Based Approaches for LDA Hyperparameter Tuning", Information and Software Technology, Under Review

0 comments on commit dc32b40

Please sign in to comment.