Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics #2

Open
drdhaval2785 opened this issue Nov 22, 2015 · 3 comments
Open

Statistics #2

drdhaval2785 opened this issue Nov 22, 2015 · 3 comments

Comments

@drdhaval2785
Copy link
Contributor

Preliminary version says the following statistics

Total entries without normalization are 434909 - hw1.txt
Total entries with anusvAra normalization are 421137 - hw2.txt
Total entries with duplication normalization are 418181 - hw3.txt

i.e. total 16728 decrease.

I am not saying whether all the deductions are correct or not.

@gasyoun
Copy link
Member

gasyoun commented Nov 22, 2015

16.5k decrease is an interesting one. After @funderburkjim will add upasarga-dhatu combinations from PW and PWG we will get plenty of new words. The biggest gain (circa 60k words) will be gone when you learn to kill M, H at end of words.

@drdhaval2785
Copy link
Contributor Author

I learnt how to remove H and M at the end.
The question is to do it without loss of any genuine candidate.

@gasyoun
Copy link
Member

gasyoun commented Nov 23, 2015

If a single is lost we can handle it. Can't we?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants