Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making a case for unique identifier for each synset / word-meaning set #409

Open
drdhaval2785 opened this issue Apr 2, 2023 · 6 comments
Assignees
Labels
question Does anybody hoes a clue?

Comments

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented Apr 2, 2023

Dictionaries sometimes encode relationships between two synsets.
We need to capture such information.
It is possible only if synset is uniquely identifiable. We can internally link them and depict the relationships between them.

e.g. शार्ङ्ग is विष्णु's चाप.
I have written it currently as विष्णुचाप, which is not so elegant way of doing it.

Data

<syns>विष्णु-पुं,नारायण-पुं,बभ्रु-पुं,चक्रपाणि-पुं,जनार्दन-पुं,दैत्यारि-पुं,पुण्डरीकाक्ष-पुं,त्रिककुद्-पुं,विष्टरश्रवस्-पुं,पीताम्बर-पुं,हृषीकेश-पुं,विष्वक्सेन-पुं,चतुर्भुज-पुं,श्रीवत्स-पुं,श्रीपति-पुं,शार्ङ्गिन्-पुं,श्रीवत्साङ्क-पुं,अच्युत-पुं,हुण-पुं,वासुदेव-पुं,स्वभू-पुं,चक्रिन्-पुं,वैकुण्ठ-पुं,पुरुषोत्तम-पुं,अरिष्टनेमि-पुं,अजित-पुं,श्रीधर-पुं,यज्ञपूरुष-पुं,मुञ्जकेशिन्-पुं,मुररिपु-पुं,गदापाणि-पुं,अधोऽक्षज-पुं,अनन्तशायिन्-पुं,वृन्दाक-पुं,मुकुन्द-पुं,धरणीधर-पुं,शतानन्द-पुं,शतावर्त-पुं,युगावर्त-पुं,सुरोत्तम-पुं,कालकुन्थ-पुं,रन्तिदेव-पुं,केशव-पुं,गरुडध्वज-पुं,पद्मनाभ-पुं,विश्वरूप-पुं,कृष्ण-पुं,हरि-पुं,असंपुष-पुं,कैटभारि-पुं,ब्रह्मनाभ-पुं,गोविन्द-पुं,मधुसूदन-पुं
विष्णुर्नारायणो बभ्रुश्चक्रपाणिर्जनार्दनः ।
;l{0020}
दैत्यारिः पुण्डरीकाक्षस्त्रिककुद्विष्टरश्रवाः ॥ १० ॥
पीतांबरो हृषीकेशो विष्वक्सेनश्चतुर्भुजः ।
;p{0004}
श्रीवत्सश्श्रीपतिश्शार्ङ्गी श्रीवत्सांकोऽच्युतो हुणः ॥ ११ ॥
वासुदेवस्स्वभूश्चक्री वैकुण्ठः पुरुषोत्तमः ।
अरिष्टनेमिरजितश्श्रीधरो यज्ञपूरुषः ॥ १२ ॥
;l{0025}
मुञ्जकेशी मुररिपुर्गदापाणिरधोऽक्षजः ।
अनन्तशायी वृन्दाको मुकुन्दो धरणीधरः ॥ १३ ॥
शतानन्दश्शतावर्तो युगावर्तस्सुरोत्तमः ।
कालकुन्थो रन्तिदेवः केशवो गरुडध्वजः ॥ १४ ॥
पद्मनाभो विश्वरूपः कृष्णो हरिरसंपुषः ।
;l{0030}
कैटभारिर्ब्रह्मनाभो गोविन्दो मधुसूदनः ॥ १५ ॥
<LEND>
.
.
.
<L>5<pc>4
<syns>कौस्तुभ-पुं,विष्णुमणि
<syns>श्रीवत्स-पुं,विष्णुलक्ष्मन्
<syns>नन्दक-पुं,विष्ण्वसि
<syns>शार्ङ्ग-क्ली,विष्णुचाप-पुं
<syns>पाञ्चजन्य-पुं,विष्णुशङ्ख-पुं
<syns>सुदर्शन-क्ली,विष्णुचक्र-क्ली
कौस्तुभोऽस्य मणिर्लक्ष्म श्रीवत्सो नन्दकस्त्वसिः ।
चापश्शार्ङ्गं पाञ्चजन्यश्शंखश्चक्रं सुदर्शनम् ॥ १७ ॥
.
.
.
<syns>चाप,धनुष्,आस,इष्वास,धनुर्,द्रुण,कार्मुक,धन्व,कोदण्ड,आयुधाग्र्य,शरासन
अस्त्रियौ चापधनुषावासेष्वासौ धनुर्द्रुणम् ।
कार्मुकं धन्व कोदण्डमायुधाग्र्यं शरासनम् ॥ १७२ ॥
@drdhaval2785
Copy link
Contributor Author

If I have unique synset numbers like the following, I can encode explicitly the relationship between them.
If I denote genitive / possessive relationship by '#', the relationship in शार्ङ्ग can be explicitly coded as 1#140 which would stand for all of the following headwords. विष्णुचाप,नारायणचाप,बभ्रुचाप,,,,,,,विष्णुधनुष्,,,,,,विष्णुकार्मुक,नारायणकार्मुक,,,,,,,,,विष्णुशरासन,,,,,मधुसूदनशरासन.

<eid>1<syns>विष्णु-पुं,नारायण-पुं,बभ्रु-पुं,चक्रपाणि-पुं,जनार्दन-पुं,दैत्यारि-पुं,पुण्डरीकाक्ष-पुं,त्रिककुद्-पुं,विष्टरश्रवस्-पुं,पीताम्बर-पुं,हृषीकेश-पुं,विष्वक्सेन-पुं,चतुर्भुज-पुं,श्रीवत्स-पुं,श्रीपति-पुं,शार्ङ्गिन्-पुं,श्रीवत्साङ्क-पुं,अच्युत-पुं,हुण-पुं,वासुदेव-पुं,स्वभू-पुं,चक्रिन्-पुं,वैकुण्ठ-पुं,पुरुषोत्तम-पुं,अरिष्टनेमि-पुं,अजित-पुं,श्रीधर-पुं,यज्ञपूरुष-पुं,मुञ्जकेशिन्-पुं,मुररिपु-पुं,गदापाणि-पुं,अधोऽक्षज-पुं,अनन्तशायिन्-पुं,वृन्दाक-पुं,मुकुन्द-पुं,धरणीधर-पुं,शतानन्द-पुं,शतावर्त-पुं,युगावर्त-पुं,सुरोत्तम-पुं,कालकुन्थ-पुं,रन्तिदेव-पुं,केशव-पुं,गरुडध्वज-पुं,पद्मनाभ-पुं,विश्वरूप-पुं,कृष्ण-पुं,हरि-पुं,असंपुष-पुं,कैटभारि-पुं,ब्रह्मनाभ-पुं,गोविन्द-पुं,मधुसूदन-पुं
<eid>17<syns>शार्ङ्ग-क्ली,1#140
<eid>140<syns>चाप,धनुष्,आस,इष्वास,धनुर्,द्रुण,कार्मुक,धन्व,कोदण्ड,आयुधाग्र्य,शरासन

@drdhaval2785
Copy link
Contributor Author

This is possible only when there are unique synsets (not Lnums) assigned to each synset / word-meaning set in samAnArthaka kosha and anekArthaka kosha respectively.

The question is what would be the ideal place to give this information?
In my opinion, xxx.txt file would be the ideal place.
Hardcoding it would serve the same purpose which Lnums are serving today.
In future, if there is some error in numbering found out, or new synset gets added based on some edition of that book, having fixed eid ensures that the relationships which had been encoded earlier does not get altered.
What does @funderburkjim think?

@drdhaval2785
Copy link
Contributor Author

eid is arbitrary name - shorthand for extra id.

@drdhaval2785 drdhaval2785 added the question Does anybody hoes a clue? label Apr 2, 2023
@gasyoun
Copy link
Member

gasyoun commented Apr 6, 2023

860

Here what would be the extra id.?

@funderburkjim
Copy link
Contributor

AFAIK, Wordnet (https://wordnet.princeton.edu/) is the primary example of synsets (for English).
It is possible to use NLTK (Natural language toolkit) with python to explore wordnet.

In wordnet, synsets are identified by a specific word (e.g. the synset for 'dog').

Bing chat tells

Synsets are linked with each other to form various kinds of relations. These relations can be semantic or lexical. Semantic relations include hypernymy (a more general concept), hyponymy (a more specific concept), meronymy (a part-whole relationship), and holonymy (a whole-part relationship). Lexical relations include antonymy (opposite meaning), entailment (one concept implies another), and derivation (one word is derived from another).

Perhaps we should model our thinking about synsets after the wordnet approach. i.e., learn how wordnet works, and make a Sanskrit-wordnet similarly. From a first glance, it appears that the underlying data structures for wordnet are likely to be described, either at the wordnet site or at nltk.

Bing chat responds to question how to make wordnet for another language using nltk?

Yes, it is possible to create your own version of WordNet for another language using NLTK.
You can use the NLTK’s wordnet reader object and initialize a wrapper object that provides its own defaults (https://stackoverflow.com/questions/39569307/how-to-change-nltk-default-wordnet-language-to-zsm)
1 (https://stackoverflow.com/questions/39569307/how-to-change-nltk-default-wordnet-language-to-zsm).
However, you need to have a WordNet-like resource for the language you want to use
(https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually)
2 (https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually).
You can also use Open Multilingual WordNet (OMW) which links WordNets of different languages to the Princeton WordNet version 3.0
(https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually)
2 (https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually).

@drdhaval2785
Copy link
Contributor Author

Sanskrit Wordnet - https://www.cfilt.iitb.ac.in/wordnet/webswn/wn.php

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Does anybody hoes a clue?
Projects
None yet
Development

No branches or pull requests

3 participants