The Poio Corpus is a freely available collection of language resources for the lesser-used languages. The data is extracted from free sources like Wikipedia, dictionaries, documents, websites and others.
The official Poio Corpus website is:
Poio Corpus is part of the Poio project:
Poio Corpus source code is distributed under the Apache 2.0 License.
Poio Corpus documentation is distributed under the Creative Commons Attribution 3.0 Unported.
Poio Corpus data packages are distributed under different licenses. Please check the LICENSE files in the data packages available under the menu option Corpus on the Poio corpus website ( for more information.