-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progress in alternate headword extraction #7
Comments
latest stats
|
Well done, well done. I love when such practical work is done from India. There have been so much theory but so little efforts to present good old books in a form they deserve. @funderburkjim please see https://docs.google.com/document/d/1YYTM2hlDYKPzKv322Cfq0Oa92RohvMzvF5jf8eqIO7w/edit# with my VCP classification.
Similar to VCP work was done on SCD and AP. Human validation not finalized, but still. |
@gasyoun raised a question on skype Then I explored the possibility of finding some mathematical way of seeing the nearness of alphabets based on pratyAhAra sutras.
i.e. 16 new parses added by this manner, and many suggestions made which were earlier left as it is. |
Tailoring the notion of edit-distance to take into account knowledge of the Sanskrit alphabet sounds like an interesting idea. Maybe the Sanskrit edit distance could somehow use the varga matrix of characters. Another possibility might take into account the similarity of the glyphs of the Devanagari characters. Another possibility might deal with the distance between consonant-vowel glyphs (e.g., the Devanagari representation of 'XA' and 'Xo' (X some consonant) might be considered less than the Sanskrit alphabetical distance between 'A' and 'o'. Lot's of interesting possibilities to explore. |
That's a wishing well - but a deep one.
भ म will do, @drdhaval2785 ?
Not sure I get in on real life examples. |
The alphabet used is purely the shiva sUtras. aAiIuUfFxeoEOhyvrlYmNRnJBGQDjbgqdKPCWTcwtkpSzs So i guess it takes care of nearness based on varga matrices. But glyph nearness remains to be explored. |
Dear all,
https://github.com/sanskrit-lexicon/VCP/tree/master/alternateheadword is the repository where I have been playing with hw1.py of ejf - renamed as hw1_dhaval.py.
with the latest commit incorporating levenshtein logic has been added to the hw1_dhaval.py code.
Logic -
The methods to make replacements are marked with a code '1' through "6" which can help us locate the part of hw1_dhaval.py which made these suggestions at later stage.
If there are no suggested headwords / non decision regarding the position of the string to be substituted by the bracket string, I have put "404" code. (See nonvalidated.txt)
Plan ahead -
The text was updated successfully, but these errors were encountered: