Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ऋकारान्त words #7

Open
drdhaval2785 opened this issue Feb 2, 2017 · 5 comments
Open

ऋकारान्त words #7

drdhaval2785 opened this issue Feb 2, 2017 · 5 comments
Assignees

Comments

@drdhaval2785
Copy link
Contributor

Read lines 157-171 of paper normalization.pdf in this repository.
This issue is raised based on @funderburkjim's query elsewhere to review logic of hwnorm1c.
ऋकारान्त words seemed difficult to handle.

LOGIC

  1. Find words in dictionaries in option 6.1 ending with 'ar'. Let us say it is stored in variable word.
  2. Check whether re.sub('ar$', 'f', word) is in dictionaries option 6.2.
  3. If yes, safe to normalize. If not, send in suspect list for manual examination.
  4. Find words ending with 'f' in dictionaries of option 6.2. Let us name them word1.
  5. Check whether re.sub('f', 'A', word1) is in dictionaries of optin 6.3.
  6. If yes, normalize corresponding word of option 6.3 dictionary from A to f. Otherwise flag for manual examination.

The implementation should be easy for Jim. I am out for four days. Operating from my mobile.

Relevant parts of paper presented below.

157 Convention 6 – Treatment of ऋकारा words
158 Option 6.1
159 Uses अर ्instead of ऋ at the end e.g.कतर.्
160 Dictionaries: BHS, CCS, PW, PWG, SCH
161 Option 6.2
162 Uses ऋ at the end e.g. कतृ.
163 Dictionaries: ACC, AP, AP90, BEN, BOP, BUR, CAE, GRA, GST, IEG, INM, MD, MW,
164 MW72, PD, SHS, STC, VCP, VEI, WIL, YAT
165 Option 6.3
166 Uses inflected form with आ at end e.g. कता.
167 Dictionaries: PUI, SKD
168 Note– (1) KRM, MCI, PE, PGN, SNP do not have enough data to decide the convention
169 conclusively.
170 Standard convention
171 6. Use ऋ at the end.
@gasyoun
Copy link
Member

gasyoun commented Feb 2, 2017

Logic makes sense, as usual.

@drdhaval2785
Copy link
Contributor Author

@funderburkjim
Handled in the current implementation of normalization program?

@drdhaval2785
Copy link
Contributor Author

We can close it then.

@funderburkjim
Copy link
Contributor

Leave open for now. The current normalization logic (used heavily in simple-search) is in the hwnorm1 repository:

https://github.com/sanskrit-lexicon/hwnorm1/tree/master/sanhw1

I don't think think the 'ar-ending' suggestions of this issue are taken into account by hwnorm1c.py.

But agree that they could/should be taken into account.
@drdhaval2785 If you're back in the programming saddle, why don't you do this modification of hwnorm1c.py ?

@gasyoun
Copy link
Member

gasyoun commented Dec 13, 2020

@drdhaval2785 If you're back in the programming saddle, why don't you do this modification of hwnorm1c.py ?

I stand on my knees, Dhaval, do not let us down. We can't make it without you any longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants