Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verb normalization #8

Open
drdhaval2785 opened this issue Feb 2, 2017 · 3 comments
Open

Verb normalization #8

drdhaval2785 opened this issue Feb 2, 2017 · 3 comments

Comments

@drdhaval2785
Copy link
Contributor

@funderburkjim elsewhere noted the following

There's also the question of how to properly associate roots in the different dictionaries. For instance,
our digitization of WIL has 'gama' for the root, but there is also a m. noun 'gama' in WIL. We should
associate the WIL verb entry 'gama' with the usual 'gam' of other dictionaries, but associate the
m. noun 'gama' of WIL with the usual 'gama' of other dictionaries. How to do this?

As this belongs to normalization section, it is noted here.

The issue can be bifurcated in two parts

  1. How to normalize verbs across different dictionaries?
  2. How to normalize only verb-verb and not verb-noun/adj/adv etc?

Regarding point 1, section 5 of paper presents some ideas.
See

138 अनार ु s in verbs are handled a little differently than convention 1 in dictionaries, so they
139 are treated here separately.
140 Option 5.1
141 Verbs are presented as in धातपाठः ु e.g. .
142 Dictionaries: KRM, PD, SKD, VCP, WIL
143 Option 5.2
144 Verbs are presented with removal of अनबु and with conversion to fifth letter. e.g. .्
145 Dictionaries: AP, BEN, BOP, BUR, CAE, CCS, GRA, GST, MD, MW, MW72, PD, PW,
146 PWG, SCH, SHS, STC, YAT47 
147 Option 5.3
148 Verbs are presented with removal of अनबु but without conversion to fifth letter i.e. with
149 अनार ु e.g. भं .्
150 Dictionaries: AP90
151 Notes regarding options 5.1 to 5.3– (1) ACC, BHS, IEG, INM, MCI, PE, PGN, PUI,
152 SNP, VEI do not have enough headwords to decide this convention decisively. (2) PD tends
153 to give two separate headwords, one following options 5.1 and the other following option 5.2
154 e.g. अिक, अ. Therefore, it is included in both categories. ्
155 Standard convention
5. Option 5.35

Point 2 is not possible to be handled unless we create sanhw1.txt, sanhw2.txt or its some altered version where a sense / meaning is unique identifier and not headword. This sense wise list would be great addition even otherwise. We will be able to tag synsets, antonyms etc also in later stage.

@gasyoun
Copy link
Member

gasyoun commented Feb 2, 2017

Point 2 is not possible

No need to add more.

As per 1st. Some missing clues are mentioned in the prefaces about the method used. I could write out those I'm aware of (some I've documented in my PhD years ago). Are you willing to help, @drdhaval2785 if I will add those notes, to continue and try to add new categories for comparison?
You've not mentioned that non-Indian dictionaries take guna as basis, that is not always equal to non-धातपाठः forms.

@drdhaval2785
Copy link
Contributor Author

@funderburkjim
A long time has passed and you have done a lot of verb markups recently.
Any update in this thread?

@gasyoun
Copy link
Member

gasyoun commented Mar 29, 2021

A long time has passed and you have done a lot of verb markups recently.

Same question, @funderburkjim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants