totalNumberOfWords #2

patrickjae · 2011-05-04T13:50:47Z

In mallet branch, HDPGibbsSampler line 71-73
simply adding the length of data to totalNumberOfWords should be faster than traversing an index and adding one each time, especially for very large documents
might look like this:

totalNumberOfWords += ((FeatureSequence) corpus.get(d).getData()).getLength();

arnim · 2011-05-04T14:50:41Z

THX - U R right.
Will refactor these lines anyway ;)
However; adding data to the sampler is not the place where a huge proportion of computation is spent.

ghost assigned arnim May 4, 2011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

totalNumberOfWords #2

totalNumberOfWords #2

patrickjae commented May 4, 2011

arnim commented May 4, 2011

totalNumberOfWords #2

totalNumberOfWords #2

Comments

patrickjae commented May 4, 2011

arnim commented May 4, 2011