- A python setup with virtualenv
- the FRIDA dataset
Clone this repository:
git clone [email protected]:HolmesNL/author-verification.git
cd author-verification
Setup a virtual environment:
virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt
To download NLTK files, start python
and run the following commands:
import nltk
nltk.download('punkt')
Copy or extract the FRIDA dataset into the frida
directory.
classify.py
SVM method
n_frequent_words=50*, tokens_per_sample=200*: cllr=0.8203017652820481 n_frequent_words=50*, tokens_per_sample=800: cllr=0.518377257507916 n_frequent_words=50*, tokens_per_sample=1400: cllr=0.3972284625209551 n_frequent_words=150, tokens_per_sample=200*: cllr=0.7941788604945595 n_frequent_words=150, tokens_per_sample=800: cllr=0.3935033064529293 n_frequent_words=150, tokens_per_sample=1400: cllr=0.312044555501085 n_frequent_words=250, tokens_per_sample=200*: cllr=0.781245633087783 n_frequent_words=250, tokens_per_sample=800: cllr=0.37812968584956114 n_frequent_words=250, tokens_per_sample=1400: cllr=0.26279502190033327