Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrections in digitisation: Andhrabharati #13

Open
Andhrabharati opened this issue Aug 5, 2021 · 14 comments
Open

Corrections in digitisation: Andhrabharati #13

Andhrabharati opened this issue Aug 5, 2021 · 14 comments

Comments

@Andhrabharati
Copy link

Andhrabharati commented Aug 5, 2021

Mismatched '[' and ']' cases

Opening '[' cases: 9 no.s
Line 42424: <>vfttiH .. [kAlaH . iti hemacandraH ..
The text after '[' belongs to the next entry (next line in print).

Line 54248: <>rAjanirGaRwaH .. [hemacandraH ..
The text after '[' belongs to the next entry (next line in print).

Line 63129: <>pAM 4 . 1 . 41 .) pippalI . SroRideSaH . [iti
The '[' is to be deleted here.

Line 92578: <>nAqIvraRaM vraRaM duzwamupadaMSaM vicarccikAm . [RAn .
The text after '[' belongs to the next line.

Line 106031: kzIRAzwakarmmA¦, [n) puM, (kzIRAni azwakarmmARi
Here the '[n)' is to be corrected as '[n]'.

Line 306359: bahvASI¦, [n) tri, (bahu aSnAtIti . bahu +
Here the '[n)' is to be corrected as '[n]'.

Line 507383: SrI¦, Ya ga pAke . iti kavikalpadrumaH .. [kryA0-
'[kryA0-' to be corrected as '(kryA0-'.

Line 507415: <>haraRam . “SrIste . [sAstAm ..”)
The '[' is to be deleted here.

Line 507429: <>mAlaSroH . [tasyAH sampUrRajAtiH . asyAH
The '[' is to be deleted here.
-----------------
Closing ']' cases: 5 no.s
Line 105215: kzarI¦, (n] (kzaraH kzaraRaM varzaRaM astvasmin kAle
Here the '(n]' is to be corrected as '[n]'.

Line 117098: <>13 . 17 . 79 .skd2-298-b+ 52]
.skd2-298-b+ 52] to be corrected as . [Page2-298-b+ 52].

Line 306359: bahvASI¦, [n) tri, (bahu aSnAtIti . bahu +
Here the '(n]' is to be corrected as '[n]'.

Line 508964: <>DaraH .. (yaTA, mAGe . 7 . 62 .]
The ']' is to be deleted here.

Line 578433: hUravaH¦, puM, (hU iti ravo'sya .] SfgAlaH . iti
Here the ']' is to be corrected as ')'.

@Andhrabharati
Copy link
Author

Andhrabharati commented Aug 5, 2021

As I had converted the SKD file to Devanagari and glanced through, found some errors in the word endings (given as alt. forms of HW entries in the book)

The list is as under-
[ca] [च]: should be [c] [च्] Typing error.
[da] [द]: should be [d] [द्] Typing error.
[ja] [ज]: should be [j] [ज्] Typing error.
[kza] [क्ष]: should be [kz] [क्ष्] Typing error.
[na] [न]: should be [n] [न्] Typing error.
[nca] [न्च]: should be [nc] [न्च्] Typing error.
[sa] [स]: should be [s] [स्] Typing error.
[za] [ष]: should be [z] [ष्] Typing error.
[zwu] [ष्टु]: should be [zwf] [ष्टृ] Typing error.
[E] [ऐ]: should be [rE] [रै] Print error; or could be left as [ऐ]कारान्तः at this particular single syllabic word, just as [ऋ]कारान्तः etc.
[Yca] [ञ्च]: should be [Yc] [ञ्च्] Typing error.

Some of these are just one or two occurrences, and some run into few hundreds.

As in any other dictionary, this SKD is also having multiple (grouped) words as HWs.
Necessary action may be taken to "do" them!!

@Andhrabharati Andhrabharati changed the title Corrections for mismatched [ & ] cases Corrections in digitisation: Andhrabharati Aug 5, 2021
@gasyoun
Copy link
Member

gasyoun commented Aug 5, 2021

some errors in the word endings

@funderburkjim sounds like a batch correction for our master.

@Andhrabharati
Copy link
Author

I have resumed looking into SKD, with the latest file (now using Jim's transcoding).

Corrected the file for the points (in 2 posts above ).

Found that 4 metalines needed correction either in k1 or k2 fields.

There are 354 entries where ka & k2 are not matching--
345 cases of (variants) [with braces]
5 cases having avagraha
2 cases with ending ';' in k2 to be removed [L-17250 & L-18724]
1 case with ending ':' in k2 to be removed [L-3648]
1 case with ending 'M' in k1 to removed [L-41771] [though the print has M, as an error!]

@Andhrabharati
Copy link
Author

Andhrabharati commented Nov 3, 2023

There are 11 cases of -1[abc]+, while 21 cases of -[abc]1+ are present within [Page-breaks].

image

So, changed all the 11 'minority' cases as in the 21 'majority' cases.

@Andhrabharati
Copy link
Author

Andhrabharati commented Nov 3, 2023

There are ~50 CDSL split L-entries, which are meant as variant 'group's by SKD compilers, marked with a flower (curly) bracket:

image
image

2369, 2370
3002, 3003
3111, 3112
3132, 3133
4923, 4924
9828, 9829
12295, 12296
20225, 20226
23263, 23264
24085, 24086
24842, 24843
24994, 24995
25964, 25965, 25966
25978, 25979
27328, 27329
27868, 27869
30138, 30139, 30140
30143, 30144
30384, 30385
30444, 30445
30979, 30980
31340, 31341
31525, 31526
32293, 32294
32402, 32403
32688, 32689
32992, 32993
33015, 33016
33372, 33373
33726, 33727
33735, 33736
34853, 34854
34874, 34875
35036, 35037
35051, 35052, 35053
35124, 35125
35242, 35243
33397, 33398
36198, 36199
36449, 36450
36502, 36503
37431, 37432
39865, 39866
39989, 39990
40008, 40009
40607, 40608
40977, 40978
41130, 41131
41708, 41709
41933, 41934
42086, 42087
42175, 42176
42177, 42178

These could be appropriately merged as single entries and kept as comma separated items in k2-field, as in some 'recent' works.

[And, there are quite many more possible in this list-- due to some systematic differences (like with/without a terminating comma etc.)!!]

@Andhrabharati
Copy link
Author

There is one bad scan from Thomas [3-021], which has the bottom portion cut and 'mysteriously' overlapped by the top portion of another page [3-023].

image

And here is a better page from elsewhere--

image

@Andhrabharati
Copy link
Author

Navigated through all the scan pages of SKD and noted the following:

Tables : 18
1-049 to 050 (continued across 2 pages)
1-076
1-114 to 119 (continued across 6 pages)
2-048 to 052 (continued across 5 pages)
2-268 [2 nos.]
2-269 to 270 (continued across 2 pages)
2-270
2-271
2-377
2-467 to 486 (continued across 20 pages)
2-831 to 832 (continued across 2 pages)
2-930 to 932 (continued across 3 pages)
3-321 to 333 (continued across 13 pages)
3-333 to 364 (continued across 32 pages)
4-200
4-200 to 201 (continued across 2 pages)
5-093 to 094 (continued across 2 pages)

Tables better rendered as pictures: 5
3-532
3-617 [2 nos.]
3-618 [2 nos.]
5-306 [2 nos.]

Pictures: 99
2-212 [5 nos.]
2-251 to 2-258 [67 nos.]
2-281 [4 nos.]
2-282
2-413
2-447
2-491 [3 nos.]
2-493
3-022
3-041
3-379
3-618 [2 nos.]
4-157 [2 nos.]
4-158 [5 nos.]
4-214
5-262
5-264
5-304

@Andhrabharati
Copy link
Author

In comparison with the print, the CDSL file has

  1. many tables marked as columns (with Cx notation), but have errors; and some of the tables are rendered as just running text-- thus losing proper understanding/visibility.
  2. just 5 <Picture> markers, at 3-618, 4-214, 5-262, 5-304 (2 nos.); thus losing the rest of the pictorial data!

@Andhrabharati
Copy link
Author

Andhrabharati commented Nov 4, 2023

Did a parsing of the data and found some more corrections--

Addl. grouped entries
5619, 5620
14664, 14665
20356, 20357
20602, 20603
22237, 22238
23958, 23959
24328, 24329
24608, 24609
24841 (with 24842, 24843)
27937, 27939
34886, 34887
37301, 37302
40352, 40353

Revised HW(s)
<L>715<pc>1-031-c<k1>अत्यन्तःसुकुमारः<k2>अत्यन्तःसुकुमारः
<L>9552<pc>2-235-a<k1>क्षीब(व)<k2>क्षीब(व)
<L>21065<pc>3-107-a<k1>पाण्डुरः<k2>पाण्डुरः
<L>23264<pc>3-302-c<k1>प्रस्तीमः<k2>प्रस्तीमः
<L>33474<pc>4-460-c<k1>विष्णुशृङ्खलः<k2>विष्णुशृङ्खलः
<L>39154<pc>5-353-c<k1>सिन्धुवारः<k2>सिन्धुवारः
<L>41578<pc>5-525-c<k1>हवङ्गः<k2>हवङ्गः

Deleted HW(s)
<L>36132<pc>5-126-a<k1>शूकडी<k2>शूकडी ;; to merge the data with previous entry

And this parsing facilitates better grasping of the structure of the work and yields many more "easy" corrections!

@Andhrabharati
Copy link
Author

By error, posted 2 posts in a 'wrong' issue--
#14 (comment)

#14 (comment)

@Andhrabharati
Copy link
Author

Noticed, by chance, two entry words that were merged into the previous entry in the CDSL text.

This prompts me to look for all such entries, to be the next point in my SKD work.

Found 39 such HWs, with a plain pattern search. Probably a few more might be lying 'hidden' still.

@funderburkjim
Copy link
Contributor

@Andhrabharati I like your comprehensive notes above. They will be helpful when revising the cdsl version. Hope you will continue these notes as your skd examination continues.

@drdhaval2785 hope you will pay attention to AB's notes (You use SKD quite a bit, right?)

funderburkjim added a commit to sanskrit-lexicon-scans/skd that referenced this issue Nov 7, 2023
@funderburkjim
Copy link
Contributor

3-021

@Andhrabharati : Have replaced pg3_021.pdf at Cologne, per img file above.

Also replaced at https://github.com/sanskrit-lexicon-scans/SKD

@Andhrabharati
Copy link
Author

Andhrabharati commented Nov 8, 2023

Probably a few more might be lying 'hidden' still.

Now the list has another 8 added, this time 3 dhatus as well, of which one is a 'real' hiding entity.

While these are all at the beginning of a new line in the print, the 'hidden' dhatu is merged with the previous entry as a running matter!

@gasyoun , probably this is an interesting point for you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants