Skip to content

Latest commit

 

History

History
20 lines (11 loc) · 462 Bytes

README.rest

File metadata and controls

20 lines (11 loc) · 462 Bytes

Wordfrec

This script is used to get the words and their frequencies from one or several xml, html and txt files and to store them in a database.

How to use

The input files must be xml, html or txt. Python 3 is required. Command:

python3 wordfreq.py file1 file2 file3...

You get the database ('wordfreq.db') with all the word forms (not lemmas) and their frequencies.

License

General Public License. See LICENSE file