-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alphabetizing errors in skd headwords #2
Comments
795 is not a small number. Could you provide the list of them, or are you sure they are exactly digitization errors? I would implement the change for sure, but would love to see the changelog.txt file on github here as well, thanks. |
795 might be big number, but finding corrections for them is not a big deal once the list is made. I myself had saw many such instances in list display- but never noticed why it happened. This time it was that very word- so it got caught. I use list display always.. for sake of devanagari. Please provide me list. |
Re: "Could you provide the list of them?" This is the file 'skdhw2_chksort.txt' in this SKD repository. To find the corrections will likely require consulting the scans. That's why 795 is not a small number. If you imagine 1 minute per instance, that would be 13 hours. If anyone wants to volunteer to find these corrections, I might be able to assist by providing some custom displays to make the workflow for the task efficient. |
Shalu is a PhD student as I am. My PhD should be over on 13 Nov 2014. As Shalu is even less github savvy, I guess a link https://github.com/sanskrit-lexicon/SKD/blob/master/skdhw2_chksort.txt is better, than just the name of it :) |
"Shalu is even less github savvy.." |
Glad to see Dhaval's useful work on skd alphabetizing errors. I'll probably wait til Dhaval is finished before implementing changes on Cologne site. Let's put that work under this 'alphabetizing errors in skd headwords' issue. I've copied Dhaval's last two comments (posted under issue #3) to here. |
This is a copy of Dhaval's first alphabetization errors list: Correcting the wrong sorted words. 1-001:aH:41,47 !< 1-001:afRI:48,55 |
This is a copy of Dhaval's first comment on alphabetization conventions in SKD: Few observations on the wrong sorting. SKD places visarga before anusvAra. e.g. 1-005:akzaraH:725,730 !< 1-005:akzaraM:731,750 |
These SKD conventions Dhaval is observing are good. When the task is finished, it should be possible Then, this explanation might be added as a Here is a link to the full list of keys for skd (as of today) in case it should be needed https://dl.dropboxusercontent.com/u/29859999/skdkeys-20140815.zip. This list is normally part of |
It's great to see, that lessons learned at https://groups.google.com/forum/#!topic/sanskrit-programmers/HTyINaNbvUQ are not lost and php sorting software can be used in several ways. Dhaval's work is of huge interest and importance. |
Dhaval - could you remind me where your work stands on this issue of alphabetization errors in SKD? As I understand it, there is still work to be done here? Do you think that this approach will find some headword spelling errors in SKD? The reason for the question is, that Sampada has a paper SKD but in a month or so she is moving and may not have the paper SKD. So, this might be a good project for her now. What do you think? |
re 1-029:ayamarTaH:4481,4643 !< 1-030:atisArakI:4644,4646 (Wrong split in definition of ati. ayamarTaH is not a separate word.) I can't confirm that. So, for now am not changing. The Sanskrit is too difficult for me to understand the context. But, assuming 'ayamarTaH' means "This is the meaning", perhaps what follows ayamarTaH is an explanation of the quote(s) preceding ayamarTah. If such is the case, then I'll make the correction. The other 5 corrections in the list above (the list starting with '1-001:aH:41,47 !< 1-001:afRI:48,55' ) have been made today. |
@funderburkjim I could try to make a fuzzy list of possible list of SKD in a week. Could compare it to VCP or MW. If Sampada is up to SKD - it's great, because otherwise the Indian origin dictionaries are even in a worse condition, than the others. |
Sampada may be working on something else for Peter, as I haven't heard from her in several days. If she finishes the mis-alphabetization cases in SKD, I was thinking about asking her check my work on a rather long list of textual 'db' corrections, so that task begun in July could be brought to a close. I presume your 'fuzzy list of possible list of SKD' means generating a list of possible spelling errors in headwords in SKD? Perhaps it would be better to focus on headword corrections before getting into the 'db' textual corrections. What do others think regarding priority (headwords v. 'db')? |
@funderburkjim I'm advocate of headword priority. Am I the only one? |
Headwords are the main gateway to users of a traditional paper dictionary. Similarly, to users of the Cologne displays (with the exception of the Advanced Search 'Text' searches). This observation makes a strong case for the priority of getting the headwords right. However, once the 'major' headwords are right (major in terms of estimated frequency of user inquiry), then further correction of 'obscure' headwords probably has no higher priority than corrections of egregious spelling errors in the text. For example, if a user of CCS looks up vidyAvid and sees as definition the misspelled 'wssenskundig', surely that experience is at least slightly unpleasant. So, text errors are important, too. So, that's my little two-step dance (one step forward, one step back) on the issue. |
I would like to cleanup CCS only when 98% of the headwords are right. When do I know they are right? When I do not find new issues for a few weeks. So I would not worry much about |
Dhaval - I think I misinterpreted your alphabetization list of corrections (see my comment above of Aug. 15) . I had thought that only the words with parentheses needed correction. HOWEVER, I now think that the list above contains all records from skdhw2_chksort.txt up through arawuH, which is the last record with page number less than 100. And, that the corrections have been made 'silently', in 15 lines (in addition to the 6 lines with parentheses). Does this sound right? |
Sampada is now working on the alphabetization errors in SKD. The first 65 of these cases correspond to ones Dhaval has already checked. A comparison of Sampada's and Dhaval's solutions for these 65 resulted in agreement except for 2 cases:
One meta observation is that it was good to have two sources of correction, as in 5 cases Sampada revised her corrections based on Dhaval's. Next are some observations made by Sampada, Peter, and me regarding these two cases.
|
As per |
But, haven't we already done some changes with the 'db' headwords that are analogous to changing 'parbb' to 'parvv' ? And, I have a large list of textual 'db' changes that I would like to make (like 'dbAdaSa' -> 'dvAdaSa', As long as we document such changes, such as in the history file for SKD, it seems to me ok to do this. However, I am also agreeable to deferring such non-headword changes while there is still obvious work to do on headword changes. |
The fact is that at least in the ayurvedic field several common terms even inside a single critical edition text occur with both |
Peter reviewed my comments on akzIbaH v. akzIvaH, and reaffirmed akzIbaH:
Since Peter is strongly in favor of akzIbaH and that is what Dhaval also suggested, I'm changing my mind and using akzIbaH. |
So no more hard cases left out for now? Sounds great, let's see what Sampada will find next. |
Sampada (with some input from ejf and Peter), examined all the headword alphabetical misorderings and makes 269 headword changes. For full details, see files in https://github.com/sanskrit-lexicon/CORRECTIONS/tree/master/dictionaries/SKD directory. This substantially extends the work begun by Dhaval. |
@gasyoun re 'let's see what Sampada will find next' : Sampada's working on the alphabetical misorderings in VCP now. re 'Dhaval's multisorter' Don't know what this is. |
|
@funderburkjim and @gasyoun |
Shalu posted a correction where two headwords 'nu' should be 'tanu'.
The list display (before correction) showed tatu, nu, nu, tanukṣīra. These two have
been corrected (to tanu).
Clearly, the 'nu' headwords here were out of alphabetical order.
At some point in working with SKD, I had written a program to check errors in alphabetical ordering of headwords with SKD. The file 'skdhw2_chksort.txt' (uploaded to this repository) identifies 795 cases of headwords out of alphabetical order. Probably most of these are due to a digitization error. There remains the task 😓 of finding corrections for these.
The text was updated successfully, but these errors were encountered: