Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about metaline for samAnArthaka dictionaries (समानार्थक कोश) #406

Open
drdhaval2785 opened this issue Mar 12, 2023 · 11 comments
Assignees
Labels
Documentation How TXT , XML work

Comments

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented Mar 12, 2023

Concept of samAnArthaka dictionaries

There is no specified headword.
Synonyms are clubbed together (with our without gender information).

Explanation in mathematical terms

If samAnArthaka relationship is denoted by f(n),
f(n) {A, B, C} would mean
A = B
A = C
B = C

Sample data

From Amarakosha nAnArthavarga

देवाः सुपर्वसुरनिर्जरदेवतर्भुबर्हिर्मुखानिमिषदैवतनाकिलेखाः ।
वृन्दारकाः सुमनसस्त्रिदशा अमर्त्याः स्वाहास्वधाक्रतुसुधाभुज आदितेयाः ॥ ८८ ॥
गीर्वाणा मरुतोऽस्वप्ना विबुधा दानवारयः ।
तेषां यानं विमानोऽन्धः पीयूषममृतं सुधा ॥ ८९ ॥

Problem to be handled

We need to devise a markup standard by which the information is captured without any loss, while encoding.
We can use this information later on, for display or otherwise.
We can later on generate synsets too.

Proposed markup (Edited per #405 (comment))

<L>1<pc>23
<syns>देव:पुं,सुपर्वन्:पुं,सुर:पुं,निर्जर:पुं,देवता:पुं,ऋभु:पुं,बर्हिर्मुख:पुं,अनिमिष:पुं,दैवत:पुं,नाकिन्:पुं,लेख:पुं,वृन्दारक:पुं,सुमनस्:पुं,त्रिदश:पुं,अमर्त्य:पुं,स्वाहभुज्:पुं,स्वधाभुज्:पुं,क्रतुभुज्:पुं,सुधाभुज्:पुं,आदितेय:पुं,गीर्वाण:पुं,मरुत्:पुं,अस्वप्न:पुं,विबुध:पुं,दानवारि:पुं
<syns>यान:क्ली,विमान:पुं
<syns>अन्धस्:क्ली,पीयुष:क्ली,अमृत:क्ली,सुधा:स्त्री
देवाः सुपर्वसुरनिर्जरदेवतर्भुबर्हिर्मुखानिमिषदैवतनाकिलेखाः ।
वृन्दारकाः सुमनसस्त्रिदशा अमर्त्याः स्वाहास्वधाक्रतुसुधाभुज आदितेयाः ॥ ८८ ॥
गीर्वाणा मरुतोऽस्वप्ना विबुधा दानवारयः ।
तेषां यानं विमानोऽन्धः पीयूषममृतं सुधा ॥ ८९ ॥
<LEND>

In case, the gender information is absent / ambiguous, do not try too hard to interpret manually. We can leave the information blank. Better not to encode information explicitly when we are not sure about the same. In the following verse, I am not sure what would be the gender of पवि and भिदु. So, kept them blank. (For later uses, this information can be pulled from other dictionaries if required).

<syns>वज्र-पुंक्ली,कुलिश-पुंक्ली,भिदुर-क्ली,शतधारक-क्ली,व्याधाम-पुं,दम्भोलि-पुं,शतकोटि-पुं,पवि,भिदु
अस्त्रियौ वज्रकुलिशौ भिदुरं शतधारकम् ।
व्याधामः पुंसि दंभोलिश्शतकोटिः पविर्भिदुः ॥ १३ ॥

Explanation of metaline

L is the lnum which would be unique for each headword:meanings pair.
pc is page-column number detail to identify the page number.
syns is comma separated list of headword:gender information of all members of the synset.

@drdhaval2785
Copy link
Contributor Author

Note that I have used the syns details in Devanagari to help easier filling of the data by non-technical non-SLP friendly people.
They can work in native Devanagari script.
पुं - musculine
स्त्री - feminine
क्ली - neuter
- indeclinable
This gender information can be expanded if needed.

@drdhaval2785 drdhaval2785 added the Documentation How TXT , XML work label Mar 12, 2023
@drdhaval2785 drdhaval2785 self-assigned this Mar 12, 2023
@drdhaval2785
Copy link
Contributor Author

Some discussions of #405 for small changes in the proposed formats should be referred.

@funderburkjim
Copy link
Contributor

What are ';c' and ';k' ?

@drdhaval2785
Copy link
Contributor Author

;c for comments
;k for kAnda i.e. chapter name

@drdhaval2785
Copy link
Contributor Author

;p for page and
;l for line

@drdhaval2785
Copy link
Contributor Author

drdhaval2785 commented Apr 9, 2023

Dear @funderburkjim

I have prepared a full fledged dictionary (abhidhānacintāmaṇi of Hemacandra) as an experiment for this samānārthaka kośa exercise.

abch1.txt

Kindly try to add it to kosha-dev for trial, and thereafter to CDSL.
Metadata is in the file itself.

@drdhaval2785
Copy link
Contributor Author

Sample data

<L>38<pc>7
<eid>53<syns>स्वर्ग-पुं,त्रिविष्टप-पुं,द्यो-स्त्री,दिव्-स्त्री,भुवि-स्त्री,तविष-पुं,ताविष-पुं,नाक-पुं,गो-स्त्री,त्रिदिव-क्ली,ऊर्ध्वलोक-पुं,सुरालय-पुं
<eid>54<syns>अमर-पुं,देव-पुं,सुपर्वन्-पुं,सुर-पुं,निर्जर-पुं,देवता-पुं,ऋभु-पुं,बर्हिर्मुख-पुं,अनिमिष-पुं,दैवत-पुं,नाकिन्-पुं,लेख-पुं,वृन्दारक-पुं,सुमनस्-पुं,त्रिदश-पुं,अमर्त्य-पुं,स्वाहाभुज्-पुं,स्वधाभुज्-पुं,क्रतुभुज्-पुं,आदितेय-पुं,गीर्वाण-पुं,मरुत्-पुं,अस्वप्न-पुं,विबुध-पुं,दानवारि-पुं
स्वर्गस्त्रिविष्टपं द्योदिवौ भुविस्तविषताविषौ नाकः ।
गौस्त्रिदिवमूर्ध्वलोकः सुरालयस्तत्सदस्त्वमराः ॥ ८७ ॥
देवाः सुपर्वसुरनिर्जरदेवतर्भुबर्हिर्मुखानिमिषदैवतनाकिलेखाः ।
वृन्दारकाः सुमनसस्त्रिदशा अमर्त्याः स्वाहास्वधाक्रतुसुधाभुज आदितेयाः ॥ ८८ ॥
गीर्वाणा मरुतोऽस्वप्ना विबुधा दानवारयः ।
<LEND>
<L>39<pc>7
<eid>55<syns>विमान-पुं,देवयान-क्ली
<eid>56<syns>अन्धस्,पीयूष-क्ली,अमृत-क्ली,सुधा-स्त्री
तेषां यानं विमानोऽन्धः पीयूषममृतं सुधा ॥ ८९ ॥
<LEND>

@drdhaval2785
Copy link
Contributor Author

The tag eid stands for extra id - See #409 for details.
It is a unique identified meant to identify the synset (for future cross-dictionary referencing, commentary referencing etc).
We can put this eid in tail in the XML file.
CDSL may not use it in the frontend, if found superfluous as of now.

@drdhaval2785
Copy link
Contributor Author

drdhaval2785 commented Apr 9, 2023

Headwords
13456
Headwords with gender information
13456
[('पुं', 6960), ('क्ली', 3068), ('स्त्री', 2492), ('पुंक्ली', 378), ('अ', 176), ('पुंस्त्री', 117), ('वा', 94), ('पुंद्वि', 39), ('त्रि', 38), ('स्त्रीक्ली', 31), ('स्त्रीब', 23), ('पुंब', 15), ('स्त्रीद्वि', 12), ('स', 4), ('पुंस्त्रीब', 2), ('क्लीद्वि', 2), ('क्लीब', 2), ('पुंक्लीब', 1), ('पुंक्लीद्वि', 1), ('वापुंक्ली', 1)]

Types of Gender-person information and their frequency
पुं - masculine
क्ली - neuter
स्त्री - feminine
त्रि - All three genders
अ - Indeclinable
स - सर्वनामन् - pronouns
वा - वाच्यलिङ्ग - Gender as per the noun following this adjective
द्वि - dual
ब - plural

@drdhaval2785
Copy link
Contributor Author

Kindly use ABCH dictionary code for this dictionary.

@funderburkjim
Copy link
Contributor

@drdhaval2785 acknowledging your request re kosha-dev version of abch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation How TXT , XML work
Projects
None yet
Development

No branches or pull requests

2 participants