You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's also the question of how to properly associate roots in the different dictionaries. For instance,
our digitization of WIL has 'gama' for the root, but there is also a m. noun 'gama' in WIL. We should
associate the WIL verb entry 'gama' with the usual 'gam' of other dictionaries, but associate the
m. noun 'gama' of WIL with the usual 'gama' of other dictionaries. How to do this?
As this belongs to normalization section, it is noted here.
The issue can be bifurcated in two parts
How to normalize verbs across different dictionaries?
How to normalize only verb-verb and not verb-noun/adj/adv etc?
Regarding point 1, section 5 of paper presents some ideas.
See
138 अनार ु s in verbs are handled a little differently than convention 1 in dictionaries, so they
139 are treated here separately.
140 Option 5.1
141 Verbs are presented as in धातपाठः ु e.g. .
142 Dictionaries: KRM, PD, SKD, VCP, WIL
143 Option 5.2
144 Verbs are presented with removal of अनबु and with conversion to fifth letter. e.g. .्
145 Dictionaries: AP, BEN, BOP, BUR, CAE, CCS, GRA, GST, MD, MW, MW72, PD, PW,
146 PWG, SCH, SHS, STC, YAT47
147 Option 5.3
148 Verbs are presented with removal of अनबु but without conversion to fifth letter i.e. with
149 अनार ु e.g. भं .्
150 Dictionaries: AP90
151 Notes regarding options 5.1 to 5.3– (1) ACC, BHS, IEG, INM, MCI, PE, PGN, PUI,
152 SNP, VEI do not have enough headwords to decide this convention decisively. (2) PD tends
153 to give two separate headwords, one following options 5.1 and the other following option 5.2
154 e.g. अिक, अ. Therefore, it is included in both categories. ्
155 Standard convention
5. Option 5.35
Point 2 is not possible to be handled unless we create sanhw1.txt, sanhw2.txt or its some altered version where a sense / meaning is unique identifier and not headword. This sense wise list would be great addition even otherwise. We will be able to tag synsets, antonyms etc also in later stage.
The text was updated successfully, but these errors were encountered:
As per 1st. Some missing clues are mentioned in the prefaces about the method used. I could write out those I'm aware of (some I've documented in my PhD years ago). Are you willing to help, @drdhaval2785 if I will add those notes, to continue and try to add new categories for comparison?
You've not mentioned that non-Indian dictionaries take guna as basis, that is not always equal to non-धातपाठः forms.
@funderburkjim elsewhere noted the following
As this belongs to normalization section, it is noted here.
The issue can be bifurcated in two parts
Regarding point 1, section 5 of paper presents some ideas.
See
Point 2 is not possible to be handled unless we create sanhw1.txt, sanhw2.txt or its some altered version where a sense / meaning is unique identifier and not headword. This sense wise list would be great addition even otherwise. We will be able to tag synsets, antonyms etc also in later stage.
The text was updated successfully, but these errors were encountered: