Making a case for unique identifier for each synset / word-meaning set #409

drdhaval2785 · 2023-04-02T11:16:29Z

Dictionaries sometimes encode relationships between two synsets.
We need to capture such information.
It is possible only if synset is uniquely identifiable. We can internally link them and depict the relationships between them.

e.g. शार्ङ्ग is विष्णु's चाप.
I have written it currently as विष्णुचाप, which is not so elegant way of doing it.

Data

<syns>विष्णु-पुं,नारायण-पुं,बभ्रु-पुं,चक्रपाणि-पुं,जनार्दन-पुं,दैत्यारि-पुं,पुण्डरीकाक्ष-पुं,त्रिककुद्-पुं,विष्टरश्रवस्-पुं,पीताम्बर-पुं,हृषीकेश-पुं,विष्वक्सेन-पुं,चतुर्भुज-पुं,श्रीवत्स-पुं,श्रीपति-पुं,शार्ङ्गिन्-पुं,श्रीवत्साङ्क-पुं,अच्युत-पुं,हुण-पुं,वासुदेव-पुं,स्वभू-पुं,चक्रिन्-पुं,वैकुण्ठ-पुं,पुरुषोत्तम-पुं,अरिष्टनेमि-पुं,अजित-पुं,श्रीधर-पुं,यज्ञपूरुष-पुं,मुञ्जकेशिन्-पुं,मुररिपु-पुं,गदापाणि-पुं,अधोऽक्षज-पुं,अनन्तशायिन्-पुं,वृन्दाक-पुं,मुकुन्द-पुं,धरणीधर-पुं,शतानन्द-पुं,शतावर्त-पुं,युगावर्त-पुं,सुरोत्तम-पुं,कालकुन्थ-पुं,रन्तिदेव-पुं,केशव-पुं,गरुडध्वज-पुं,पद्मनाभ-पुं,विश्वरूप-पुं,कृष्ण-पुं,हरि-पुं,असंपुष-पुं,कैटभारि-पुं,ब्रह्मनाभ-पुं,गोविन्द-पुं,मधुसूदन-पुं
विष्णुर्नारायणो बभ्रुश्चक्रपाणिर्जनार्दनः ।
;l{0020}
दैत्यारिः पुण्डरीकाक्षस्त्रिककुद्विष्टरश्रवाः ॥ १० ॥
पीतांबरो हृषीकेशो विष्वक्सेनश्चतुर्भुजः ।
;p{0004}
श्रीवत्सश्श्रीपतिश्शार्ङ्गी श्रीवत्सांकोऽच्युतो हुणः ॥ ११ ॥
वासुदेवस्स्वभूश्चक्री वैकुण्ठः पुरुषोत्तमः ।
अरिष्टनेमिरजितश्श्रीधरो यज्ञपूरुषः ॥ १२ ॥
;l{0025}
मुञ्जकेशी मुररिपुर्गदापाणिरधोऽक्षजः ।
अनन्तशायी वृन्दाको मुकुन्दो धरणीधरः ॥ १३ ॥
शतानन्दश्शतावर्तो युगावर्तस्सुरोत्तमः ।
कालकुन्थो रन्तिदेवः केशवो गरुडध्वजः ॥ १४ ॥
पद्मनाभो विश्वरूपः कृष्णो हरिरसंपुषः ।
;l{0030}
कैटभारिर्ब्रह्मनाभो गोविन्दो मधुसूदनः ॥ १५ ॥
<LEND>
.
.
.
<L>5<pc>4
<syns>कौस्तुभ-पुं,विष्णुमणि
<syns>श्रीवत्स-पुं,विष्णुलक्ष्मन्
<syns>नन्दक-पुं,विष्ण्वसि
<syns>शार्ङ्ग-क्ली,विष्णुचाप-पुं
<syns>पाञ्चजन्य-पुं,विष्णुशङ्ख-पुं
<syns>सुदर्शन-क्ली,विष्णुचक्र-क्ली
कौस्तुभोऽस्य मणिर्लक्ष्म श्रीवत्सो नन्दकस्त्वसिः ।
चापश्शार्ङ्गं पाञ्चजन्यश्शंखश्चक्रं सुदर्शनम् ॥ १७ ॥
.
.
.
<syns>चाप,धनुष्,आस,इष्वास,धनुर्,द्रुण,कार्मुक,धन्व,कोदण्ड,आयुधाग्र्य,शरासन
अस्त्रियौ चापधनुषावासेष्वासौ धनुर्द्रुणम् ।
कार्मुकं धन्व कोदण्डमायुधाग्र्यं शरासनम् ॥ १७२ ॥

The text was updated successfully, but these errors were encountered:

drdhaval2785 · 2023-04-02T11:20:31Z

If I have unique synset numbers like the following, I can encode explicitly the relationship between them.
If I denote genitive / possessive relationship by '#', the relationship in शार्ङ्ग can be explicitly coded as 1#140 which would stand for all of the following headwords. विष्णुचाप,नारायणचाप,बभ्रुचाप,,,,,,,विष्णुधनुष्,,,,,,विष्णुकार्मुक,नारायणकार्मुक,,,,,,,,,विष्णुशरासन,,,,,मधुसूदनशरासन.

<eid>1<syns>विष्णु-पुं,नारायण-पुं,बभ्रु-पुं,चक्रपाणि-पुं,जनार्दन-पुं,दैत्यारि-पुं,पुण्डरीकाक्ष-पुं,त्रिककुद्-पुं,विष्टरश्रवस्-पुं,पीताम्बर-पुं,हृषीकेश-पुं,विष्वक्सेन-पुं,चतुर्भुज-पुं,श्रीवत्स-पुं,श्रीपति-पुं,शार्ङ्गिन्-पुं,श्रीवत्साङ्क-पुं,अच्युत-पुं,हुण-पुं,वासुदेव-पुं,स्वभू-पुं,चक्रिन्-पुं,वैकुण्ठ-पुं,पुरुषोत्तम-पुं,अरिष्टनेमि-पुं,अजित-पुं,श्रीधर-पुं,यज्ञपूरुष-पुं,मुञ्जकेशिन्-पुं,मुररिपु-पुं,गदापाणि-पुं,अधोऽक्षज-पुं,अनन्तशायिन्-पुं,वृन्दाक-पुं,मुकुन्द-पुं,धरणीधर-पुं,शतानन्द-पुं,शतावर्त-पुं,युगावर्त-पुं,सुरोत्तम-पुं,कालकुन्थ-पुं,रन्तिदेव-पुं,केशव-पुं,गरुडध्वज-पुं,पद्मनाभ-पुं,विश्वरूप-पुं,कृष्ण-पुं,हरि-पुं,असंपुष-पुं,कैटभारि-पुं,ब्रह्मनाभ-पुं,गोविन्द-पुं,मधुसूदन-पुं
<eid>17<syns>शार्ङ्ग-क्ली,1#140
<eid>140<syns>चाप,धनुष्,आस,इष्वास,धनुर्,द्रुण,कार्मुक,धन्व,कोदण्ड,आयुधाग्र्य,शरासन

drdhaval2785 · 2023-04-02T11:23:18Z

This is possible only when there are unique synsets (not Lnums) assigned to each synset / word-meaning set in samAnArthaka kosha and anekArthaka kosha respectively.

The question is what would be the ideal place to give this information?
In my opinion, xxx.txt file would be the ideal place.
Hardcoding it would serve the same purpose which Lnums are serving today.
In future, if there is some error in numbering found out, or new synset gets added based on some edition of that book, having fixed eid ensures that the relationships which had been encoded earlier does not get altered.
What does @funderburkjim think?

drdhaval2785 · 2023-04-02T11:23:41Z

eid is arbitrary name - shorthand for extra id.

gasyoun · 2023-04-06T22:26:36Z

Here what would be the extra id.?

funderburkjim · 2023-04-08T03:02:03Z

AFAIK, Wordnet (https://wordnet.princeton.edu/) is the primary example of synsets (for English).
It is possible to use NLTK (Natural language toolkit) with python to explore wordnet.

In wordnet, synsets are identified by a specific word (e.g. the synset for 'dog').

Bing chat tells

Synsets are linked with each other to form various kinds of relations. These relations can be semantic or lexical. Semantic relations include hypernymy (a more general concept), hyponymy (a more specific concept), meronymy (a part-whole relationship), and holonymy (a whole-part relationship). Lexical relations include antonymy (opposite meaning), entailment (one concept implies another), and derivation (one word is derived from another).

Perhaps we should model our thinking about synsets after the wordnet approach. i.e., learn how wordnet works, and make a Sanskrit-wordnet similarly. From a first glance, it appears that the underlying data structures for wordnet are likely to be described, either at the wordnet site or at nltk.

Bing chat responds to question how to make wordnet for another language using nltk?

Yes, it is possible to create your own version of WordNet for another language using NLTK.
You can use the NLTK’s wordnet reader object and initialize a wrapper object that provides its own defaults (https://stackoverflow.com/questions/39569307/how-to-change-nltk-default-wordnet-language-to-zsm)
1 (https://stackoverflow.com/questions/39569307/how-to-change-nltk-default-wordnet-language-to-zsm).
However, you need to have a WordNet-like resource for the language you want to use
(https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually)
2 (https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually).
You can also use Open Multilingual WordNet (OMW) which links WordNets of different languages to the Princeton WordNet version 3.0
(https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually)
2 (https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually).

drdhaval2785 · 2023-04-08T04:03:55Z

Sanskrit Wordnet - https://www.cfilt.iitb.ac.in/wordnet/webswn/wn.php

drdhaval2785 assigned funderburkjim Apr 2, 2023

drdhaval2785 added the question Does anybody hoes a clue? label Apr 2, 2023

This was referenced Apr 9, 2023

Discussion about metaline for anekArthaka dictionaries (अनेकार्थक कोश) #405

Open

Discussion about metaline for samAnArthaka dictionaries (समानार्थक कोश) #406

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making a case for unique identifier for each synset / word-meaning set #409

Making a case for unique identifier for each synset / word-meaning set #409

drdhaval2785 commented Apr 2, 2023 •

edited

Loading

drdhaval2785 commented Apr 2, 2023

drdhaval2785 commented Apr 2, 2023

drdhaval2785 commented Apr 2, 2023

gasyoun commented Apr 6, 2023

funderburkjim commented Apr 8, 2023

drdhaval2785 commented Apr 8, 2023

Making a case for unique identifier for each synset / word-meaning set #409

Making a case for unique identifier for each synset / word-meaning set #409

Comments

drdhaval2785 commented Apr 2, 2023 • edited Loading

Data

drdhaval2785 commented Apr 2, 2023

drdhaval2785 commented Apr 2, 2023

drdhaval2785 commented Apr 2, 2023

gasyoun commented Apr 6, 2023

funderburkjim commented Apr 8, 2023

drdhaval2785 commented Apr 8, 2023

drdhaval2785 commented Apr 2, 2023 •

edited

Loading