-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indentation / data error for headword categories in MW #1617
Comments
So it's off the list, lower than it belongs to. |
Yes. sAMvittika should be changed to Here are details. In mw dictionary,
In the list display, the
|
Correction made, but it seems that there are many such cases. Some programmatic way to find such errors should be found out. One way would be to find out the old digitization of monier.xml and see for Same with other categories. @funderburkjim may like to throw some light on the same. |
There is need to recall MW's description of the H1-4: https://sanskrit-lexicon.uni-koeln.de/scans/csldev/csldoc/build/dictionaries/prefaces/mwpref/mwpref11.html Based on MW's criteria:
No NLP-type accuracy test for the H-values comes to mind. Are there several other examples of such errors, in addition to the one found at sAmvittika. ? |
I submitted many such errors through the following webpage: https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-corrections/app/correction_form.php?dict=MW I termed all such errors as "Hierarchy" or "Hierarchical" errors. |
Just for the record, I saw in MONIER.ALL file (very early digitization) and in that too the data is as per mw.txt. So not possible to find some pattern by which such errors can be fetched programmatically. So, it seems to be a manual work ahead. |
@drdhaval2785 / @funderburkjim Just like to bring your attention to these two lines of text from mw_orig_utf8.txt-- Only issue I see is that the mw.txt (later) has been split further (as per @gasyoun's request) to a level bit too-much, to correlate with this old data. [But, still I can see a way ahead, that Jim might come up with quite soon, if he puts his mind on the issue.] |
And it is not out of context here, for me to say that I am reverting this MW99 split-up to the 'theme' that I had envisaged to be applicable to almost all the CDSL works (PWG, pwk, Apte, MW72, MD, MW99, ...), in my current working. |
mw_correctionform.txt in csl-corrections repo shows about 11 of these hierarchy errors. Thanks for mentioning, @aumsanskrit . |
On a closer look, even the mw_orig_utf8.txt is NOT proper wrt the print (so far as Hn numbering is concerned) at too many places. |
Not only the H1 & H2 marking differences (as discussed above), it is now identified that quite many H2 entities were marked as H3!! |
Now came across cases of H3 entries marked as H2 (the reverse to above)! |
A user @aumsanskrit reported the following error
QUOTE
I have included two screenshots for the first Hierarchy Error that I referenced. In the selected word online, you will see in the left column that the word is “indented” indicating that the online dictionary has placed this word “underneath” the word listed above (as if there is an etymological relationship). However no such direct etymological relation exists in the printed dictionary because the word is actually listed “independently”. Certainly the word is listed directly below on the printed page, but the listing is “independent” and not “Etymologically” related. In the Online version the “indentation” of this particular word in the left-hand column suggests an “Etymological” relationship to the above word which actually does not exist.
UNQUOTE
Website (List display)
Print
Data
I am not sure why sAMvAhika bears
<e>1
and sAMvittika bears<e>2
.This seems to be leading to the error in indentation in the list display.
The text was updated successfully, but these errors were encountered: