Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper Names in SLP1 ({r}Ama for Rāma) #24

Open
gasyoun opened this issue May 18, 2015 · 14 comments
Open

Proper Names in SLP1 ({r}Ama for Rāma) #24

gasyoun opened this issue May 18, 2015 · 14 comments

Comments

@gasyoun
Copy link
Member

gasyoun commented May 18, 2015

At drdhaval2785/SanskritSorting#27 Jim said It would be possible to adapt transcoder files to work with {} for proper names. As I want to have the proper names in my Reverse dictionary I humbly ask to extract the proper names data from at least MW. Could you include in a plan, Jim, please?

@funderburkjim
Copy link
Contributor

Is what you are looking for a list of MW headwords which are proper names?

@gasyoun
Copy link
Member Author

gasyoun commented May 18, 2015

Indeed and not only that. I'm thinking how to extract all of them from all dictionaries. I'm ready to do even additional markup, but first I would want to listen to your bright ideas.

@funderburkjim
Copy link
Contributor

For MW, many will be caught by searching for

<ab>N.</ab>

So, I propose generating a list of such headwords.

What kind of output are you looking for?

@funderburkjim
Copy link
Contributor

The N. search will match some 'L-number' records that are not PROPER names, such as

<H2B><h><hc3>110</hc3><key1>kAlaka</key1><hc1>2</hc1><key2>kAlaka</key2><hom>1</hom></h>
<body> <lex type="inh">n.</lex> <c>N._of_a_pot-herb</c> <ls>Bhpr.</ls> </body><tail>
<MW>033095</MW> <mat/> <pc>277,3</pc> <L>49284</L></tail></H2B>

@gasyoun
Copy link
Member Author

gasyoun commented May 18, 2015

#12 continued. Sure there are many false positives as the pot-herb, but a lot of valuable data as well. Can I add the tags additionally to ? I'm ready to mark human/deva proper names.
Oh, ok, I wonder if some names are left out of <ab>N.</ab> and if there is a way other than manual to check it. 823 matches - seems too small to be true. 47027 (excluding 823 marked) - seems to be closer to the truth.

<c>N._of_a_man</c>
<c>N._of_an_<as0>A1ditya</as0><as1><s>Aditya</s></as1></c>
<c>N._of_a_woman_;_</c>

How about adding the data to https://github.com/funderburkjim/MWlexnorm/ How about comparing the list with Mahabharata Index and Puranic Encyclopedia?

@funderburkjim
Copy link
Contributor

Re '47027 ... seems closer to the truth' This sounds right. I did not realize that there were so many 'naked'
'N.' abbreviations (Here 'naked' means not clothed by an 'ab' tag.)

Since you can generate the list, what help are you looking to me to provide?

@gasyoun
Copy link
Member Author

gasyoun commented May 19, 2015

How should I spread the ab tags? How to widen the usage? Looking through my 30 000 additions manually does not sound to be a good idea.

@funderburkjim
Copy link
Contributor

I need to see a sample of the file(s) you are using.
Not sure what 'spread the ab tags' means.
Not sure what 'widen the usage' means.

@gasyoun
Copy link
Member Author

gasyoun commented May 19, 2015

Now only 700 words have it. I propose 30 000 should have them. There are no sample files. I need to understand how I can contribute in this markup expansion project, if you agree. I'm looking for names of living creatures.

@gasyoun
Copy link
Member Author

gasyoun commented Sep 4, 2021

@Andhrabharati do you understand the issue?

@Andhrabharati
Copy link
Contributor

I do; but I've no intention to work for SLP1 stuff!!

And guess, you should first need to consult Peter Scharf before playing around with SLP1 thus

@funderburkjim
Copy link
Contributor

@gasyoun Here's something that might be relevant for getting some proper names.
Example under L=6, headword 'a':

<ab>N.</ab> of <s1 slp1="vizRu">Viṣṇu</s1>

Similarly, under
Thus, we see that 'aH' is a name of Viṣṇu. Thus 'a' can be a proper name, although of course 'a' has other uses.

The search4709 matches in 4707 lines for "<ab>N[.]</ab> of <s1 slp1=" By search for the headwords in which
these matches occur, I think you would get a fairly large list of words that could be proper names.

A slight variant would give some more headwords that can be used as proper nouns:
4233 matches in 4229 lines for "<ab>N[.]</ab> of a <s1 slp1=" such as

<L>54<pc>1,2<k1>aMSu<k2>aMSu<e>1A
¦ <ab>N.</ab> of a <s1 slp1="fzi">Ṛṣi</s1>, <ls>RV. viii, 5, 26</ls><info lex="inh"/>

Another variation 672 matches for "<ab>N[.]</ab> of an <s1 slp1=" e.g.,

<L>19<pc>1,1<k1>aMSa<k2>a/MSa<e>1A
¦ <ab>N.</ab> of an <s1 slp1="Aditya">Āditya</s1>.

Is this approach in the direction you are going, or maybe you've already exhausted such an approach ?

@funderburkjim
Copy link
Contributor

funderburkjim commented Sep 18, 2021

An entirely different approach might be to use the headwords in INM (Index to names in the Mahabharata).

Presumably nearly every headword there is a proper name.

Similarly, there are many proper names among the headwords of ACC.

And back to the previous comment, another search that would lead to many proper-name headwords is (in MW):
16202 matches in 16190 lines for "<ab>N[.]</ab> of a [a-zA-Z] such as

<L>59<pc>1,2<k1>aMSuDAna<k2>aMSu—DAna<e>3
<s>aMSu—DAna</s> ¦ <lex>n.</lex> <ab>N.</ab> of a village    <<< PLACE NAME

<L>268<pc>2,2<k1>akAsAra<k2>a-kAsAra<e>1
<s>a-kAsAra</s> ¦ <lex>m.</lex> <ab>N.</ab> of a teacher    <<< PERSON NAME

And still another variant with many matches:
11764 matches for "<ab>N[.]</ab> of <ab>wk" Headword is name of a work.
Example:

<L>922<pc>5,1<k1>agnigranTa<k2>agni/—granTa<e>3
<s>agni/—granTa</s> ¦ <lex>m.</lex> <ab>N.</ab> of <ab>wk.</ab>

Seems like there are lots of searches that will get high density of matches to headwords that may appear as
proper names of one kind or another.

@Andhrabharati
Copy link
Contributor

As the issue is 'continued' in another repo, this issue could be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@gasyoun @funderburkjim @Andhrabharati and others