-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion about metaline for anekArthaka dictionaries (अनेकार्थक कोश) #405
Comments
I need to create a small sample of 100 verses or so of this type and work with Jim to modify the make_xml.py to take this modified metaline structure into account and generate the XMLs which are more in sync with CDSL XML types. |
Note that I have used the gender and headword details in Devanagari to help easier filling of the data by non-technical non-SLP friendly people. |
Also add, त्रि - adjective the list can be expanded, as more dictionaries are being included. |
Typing |
It is not possible in some cases to decipher the gender from the case ending itself. Something like the following.
|
When the gender info in meaning is clear like “kambuni” i.e. saptami of “kambu” in neuter gender, I will keep such info. It is unambiguous case ending. |
In any normal context, the I would suggest typing a simple |
Fine. I will use - |
Kindly find attached the file with markup of 120 verses made as per the discussion above. Go through the same and suggest what kind of XML do we need to create. |
@drdhaval2785 seems that the call has revived your idea, happy to see it. @Andhrabharati it's good to have you around. |
@gasyoun I am proud enough to say that no one ever can beat me in this; I just wish my experience be used beneficially by others (when I 'talk'). @drdhaval2785 Would it not be a good idea to add them as well, so that that it would be further useful to the end-users? |
I agree that if would be a good addition if we are able to capture the meanings in Hindi / English provided by the editor of the works. Only thing I am concerned about is that they are fresh works and may be in copyright. |
Let's make a list of whom @Andhrabharati considers as valuable. |
Why multiple 'L' in an entry?Example
Possible alternative:
The value of In this case the 'entry' is the 'document' containing the next 4 lines. (all the lines up to, but not including L is numeric, and the sqlite database is typically ordered by L. A basic (or mobile1) display would display the 'document' (these 4 lines, with some html prettification) from user entry of any of the 8 words कटक through क्षुद्रवैरिन्. Right? what is क्ली?Secondary question: what is क्ली an abbreviation for ? I guess some gender information. |
klI is shorthand for klIba i.e. neuter gender |
I agree regarding your suggestion to use L number to refer to the entry. Will update the files accordingly. And your understanding is correct that any of those 8 headwords should lead to this entry. All possible gender information: Currently I run a script which captures all the gender information markup and its frequency of occurrence in the dictionary. I will post such info for each dictionary being added. I will also keep a file which will note all abbreviations being used in all koshas. This will help us to create tooltips if needed. |
Updated the file according to the requirements specified above. |
The gender information is as follows for the present work |
Based on #409 , the file ankh1.txt is modified to have a unique identifier per anekArthaka word-meanings set.
|
@funderburkjim |
@drdhaval2785 I missed your request to use ANHK . Will make v3 to change from HARSA to ANHK. |
Concept of anekArthaka dictionaries
There is a headword (with or without gender information).
For that given headword, single / multiple meanings are given (with or without information).
Meanings are not necessarily synonymic. Mostly they are not.
Explanation in mathematical terms
If anekArthaka relationship is denoted by f(n),
A f(n) {B, C} would mean
A = B
A = C
but B is not necessarily equal to C. It most probably would also hold true for B != C.
Sample data
From Amarakosha nAnArthavarga
मारुते वेधसि ब्रघ्ने पुंसि कः कं शिरोऽम्बुनोः ।
स्यात्पुलाकस्तुच्छधान्ये संक्षेपे भक्तसिक्थके ॥ ५ ॥
Problem to be handled
We need to devise a markup standard by which the information is captured without any loss, while encoding.
We can use this information later on, for display or otherwise.
We can later on generate synsets too.
Proposed markup (Edited per #405 (comment))
In case, the gender information is absent / ambiguous, do not try too hard to interpret manually. We can leave the information blank. e.g. in the following, it is not clear whether अनुष्टुभ् or यशस् are neuter / feminine / musculine. So, kept them blank. (For later uses, this information can be pulled from other dictionaries if required). Better not to encode information explicitly when we are not sure about the same.
Explanation of metaline
L
is the lnum which would be unique for eachheadword:meanings
pair.pc
is page-column number detail to identify the page number.k1
isheadword:gender
informationmeanings
is comma separated list ofmeaning:gender
information.The text was updated successfully, but these errors were encountered: