Verb normalization #8

drdhaval2785 · 2017-02-02T05:14:13Z

@funderburkjim elsewhere noted the following

There's also the question of how to properly associate roots in the different dictionaries. For instance,
our digitization of WIL has 'gama' for the root, but there is also a m. noun 'gama' in WIL. We should
associate the WIL verb entry 'gama' with the usual 'gam' of other dictionaries, but associate the
m. noun 'gama' of WIL with the usual 'gama' of other dictionaries. How to do this?

As this belongs to normalization section, it is noted here.

The issue can be bifurcated in two parts

How to normalize verbs across different dictionaries?
How to normalize only verb-verb and not verb-noun/adj/adv etc?

Regarding point 1, section 5 of paper presents some ideas.
See

138 अनार ु s in verbs are handled a little differently than convention 1 in dictionaries, so they
139 are treated here separately.
140 Option 5.1
141 Verbs are presented as in धातपाठः ु e.g. .
142 Dictionaries: KRM, PD, SKD, VCP, WIL
143 Option 5.2
144 Verbs are presented with removal of अनबु and with conversion to fifth letter. e.g. .्
145 Dictionaries: AP, BEN, BOP, BUR, CAE, CCS, GRA, GST, MD, MW, MW72, PD, PW,
146 PWG, SCH, SHS, STC, YAT47 
147 Option 5.3
148 Verbs are presented with removal of अनबु but without conversion to fifth letter i.e. with
149 अनार ु e.g. भं .्
150 Dictionaries: AP90
151 Notes regarding options 5.1 to 5.3– (1) ACC, BHS, IEG, INM, MCI, PE, PGN, PUI,
152 SNP, VEI do not have enough headwords to decide this convention decisively. (2) PD tends
153 to give two separate headwords, one following options 5.1 and the other following option 5.2
154 e.g. अिक, अ. Therefore, it is included in both categories. ्
155 Standard convention
5. Option 5.35

Point 2 is not possible to be handled unless we create sanhw1.txt, sanhw2.txt or its some altered version where a sense / meaning is unique identifier and not headword. This sense wise list would be great addition even otherwise. We will be able to tag synsets, antonyms etc also in later stage.

The text was updated successfully, but these errors were encountered:

gasyoun · 2017-02-02T11:08:33Z

Point 2 is not possible

No need to add more.

As per 1st. Some missing clues are mentioned in the prefaces about the method used. I could write out those I'm aware of (some I've documented in my PhD years ago). Are you willing to help, @drdhaval2785 if I will add those notes, to continue and try to add new categories for comparison?
You've not mentioned that non-Indian dictionaries take guna as basis, that is not always equal to non-धातपाठः forms.

drdhaval2785 · 2020-12-12T16:32:38Z

@funderburkjim
A long time has passed and you have done a lot of verb markups recently.
Any update in this thread?

gasyoun · 2021-03-29T20:21:58Z

A long time has passed and you have done a lot of verb markups recently.

Same question, @funderburkjim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verb normalization #8

Verb normalization #8

drdhaval2785 commented Feb 2, 2017

gasyoun commented Feb 2, 2017

drdhaval2785 commented Dec 12, 2020

gasyoun commented Mar 29, 2021

Verb normalization #8

Verb normalization #8

Comments

drdhaval2785 commented Feb 2, 2017

gasyoun commented Feb 2, 2017

drdhaval2785 commented Dec 12, 2020

gasyoun commented Mar 29, 2021