Skip to content
This repository has been archived by the owner on Oct 10, 2024. It is now read-only.

Implement Word Dictonary / BagOfWords #19

Open
aron-bordin opened this issue Oct 7, 2017 · 0 comments
Open

Implement Word Dictonary / BagOfWords #19

aron-bordin opened this issue Oct 7, 2017 · 0 comments

Comments

@aron-bordin
Copy link
Member

aron-bordin commented Oct 7, 2017

Receives text (or a list of texts) as input, and a tokenizer, and generates a word dictionary.

  • implement with dask for multiprocessing/threading speedup, or implement with C (discuss what is the best option).
  • limit the number of elements in the dict
  • set the minimum occurrence
  • can be used as a bag of words, keeps the freq of each term in the corpus.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant