This is a project, written in Python, that compares the accuracy between two of the most used Python packages for language detection: Langdetect and Langid.
A simple txt file is introduced as input and the program runs through each line of set file and predicts the language in which the line is written. These languages (and their probabilities) are then returned in a new csv* file called lang_detection_results.csv.
The program also outputs the prediction perfomance of each algorythm in various charts.
NOTE: This is strictly a prediction accuracy comparison, NOT a technical performance one. This means that there is no comparison on overall speed or memory or similar usage that each algorythm offers.
- Python installed. The version used for this project is Python 3.11.2
- make installed (for more information click here)
- Recomended but not mandatory VS Code or a similiar IDE
- First you will have to clone the project from this github repository
git clone https://github.com/syordanov94/language_detection.git
- Once cloned, you will need to download and update all module dependencies. To do this just use the make file provided by running the following command:
make
- Once upgraded, you can run the language_detection.py file that performs all the functionality.
python3 language_detection.py
or
python language_detection.py
- Once ran, this will produce an output like the following: