Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All dhatu entries #9

Open
Shalu411 opened this issue Jan 16, 2020 · 30 comments
Open

All dhatu entries #9

Shalu411 opened this issue Jan 16, 2020 · 30 comments

Comments

@Shalu411
Copy link
Collaborator

Hariom, Greetings to all. Am back after very long time! Let me be happy only after I continue to be here.. not right now. :)
Have been working with dhAtus lately. Here is a request-

  1. I need the whole list of all existent dhAtus, not just headwords but with inner explanatory text contents separately.. either each as a separate text file or whole as one file.. whichever is easy.
  2. Text should be in Devanagari inevitably.
  3. If files are chosen to kept separate for each dhAtu, for convenience sake, then names of files are expected in IAST i.e. Roman Diacritics.
    Please do let me know if this is feasible and workable.
    -Regards
    Shalu
@Shalu411
Copy link
Collaborator Author

Shalu411 commented Jan 16, 2020

The second request is-

  1. If possible, with the upasargas marked exclusively- as a separate list, but within the dhAtu explanation content. In VCP the entry text content has plus + symbols for each.
    For example- dA-dhAtu text is added. + appears 28 times. So 28 upasarga-combinations are taken by dA-dhAtu. The first upasarga is अति + and last is- प्रति +

  2. I will need 'the upasarga and plus sign', at least, in bold.. or some kind of separate marker.. so that I could identify it amidst the long explanation stuff. ONLY if possible.
    For eg. I tried to do first six.
    (here is the dhatu with its tense and other forms)
    Entry starts thus-- (I have put points to indicate the text running later)
    दा दाने जुहो० उभ० सक० सेट् । ददाति दत्ते प्रणिददाति ।
    दद्यात् ददीत ।..................................................................................
    (Here start the upasargas)
    अति + अतिक्रम्य दाने अत्यन्तदाने च । “न जीवन्तमति-
    ददाति” कात्या० ४ । १ । २७ । “अतिद बलिर्बद्धः” चाण०
    अनु + पञ्चाद्दाने तुल्यरूपदाने प्रतिनिधित्वेन च । “न
    दूढ्ये अनुददासि वामम्” ऋ० १ । १९० । ५ । “सूराश्चदस्मा अनु
    दादपस्याम्” ७ । ४५ । २ “यः शर्धते नानददाति शृध्याम् २ ।
    १२ । २०
    अभि + आभिमुख्येन दाने “तथैव चैनमुक्त्वा वामपार्ष्णिमभ्य-
    दात्” भा० व० १९७ अ० गद्यम् ।
    अव + अधोदाने अवत्तम् आदिकर्मणि तु वा तादेशः
    यथाह सि० कौ० “अवदत्तं विदत्तञ्च प्रदत्तञ्चादिकर्मणि
    [Page3511-b+ 38]

सुदत्तमनुदत्तञ्च निदत्तमिति वेष्यते” चशब्दाद्यथाप्राप्तम् ।
आ + ग्रहणे “दद्याच्चैवाददीत वा” । “शुभां विद्या
माददीतावरादपि” मनुः । “स्वं चादास्यामि भूयोऽह
पाप्मानं जरया सह” भा० आ० ८४ अ० । “शरीरमात्तं
मृत्युना” छा० उ० “आददानः परक्षेत्रात्” मनुः ।
अप + आ + अपेक्ष्य ग्रहणे । “मृत्पिण्डमपादाय महा-
वीरं करोति” शत० ब्रा० १४ । १ । २ । १७
उद् + आ + उदस्य ग्रहणे “उदादाय पृथिवीं जीवदानुम्”
यजु० १ । २८ ।

So sometimes two upasargas will be there- We need to bold that till second one.
Do please let me know how much workable is this? And what contribution is needed my side. :) Gladly waiting a reply.
-Shalu

@funderburkjim
Copy link
Contributor

@Shalu411 Hi - long time no see!

The first step is to somehow find the entries of VCP that are verbs.

One approach is to use the first line (from the digitization vcp.txt), and look for one or more
patterns that indicate verbs.

In many cases, the first word in the entry for a verb is a word giving the sense (artha); and this
often (always?) seems to be a word in locative singular, such as (in slp1 spelling of vcp.txt):

  • aMSa¦ viBAjane ada0 cu0 uBa0 . aMSayati te AMSiSat ta .
  • akza¦ vyAptO saMhatO ca BvA0 pa0 vew . akzati akzRoti
  • arca¦ pUjAyAM curA0 uBa0 saka0 sew . arcayati te Arci-
  • arca¦ pUjAyAm uBa0 BvAdi0 saka0 sew . arcati te ArcIt

Doing a search of vcp.txt on these 4 patterns yields 2361 matches: https://gist.github.com/funderburkjim/be63c2a034770cf42f01b8e2955d9f9f

Take a look at this list.
My quick glance makes me think most are verbs. But you should carefully review the list and
look for 'false positives' (i.e., cases where a line is NOT a verb); these false positives (if any)
need to be removed.

Once the false positives are removed, we'll have a good starting point for a list of verbs in vcp.

The next question would be: are there any MISSING verbs (i.e., entries in vcp for verbs
that are not in the list). If you find any missing verbs, maybe there is another pattern (in addition
to those above) that we can use.
Eventually, we'll be able to have a list of entries that we are confident represent most, if not all,
of the verbs in vcp.

Once we've got this list of verbs, we can think more about how to present, for the verbs in this list, the information you are looking for.

@gasyoun
Copy link
Member

gasyoun commented Jan 17, 2020

2361 matches

Number looks promising, anything around 2200 should be.

carefully review the list and look for 'false positives'

This is what Usha is good about.

@Shalu411
Copy link
Collaborator Author

Namaste. Wow! So glad to hear such a quick reply, Jim!
Sure, will check and be back.. Had taken an initial look however!

Once the false positives are removed, we'll have a good starting point for a list of verbs in vcp.
Sure! Waiting for that day.
Once VCP works out, we could plan Apte! :)

@gasyoun
Copy link
Member

gasyoun commented Jan 18, 2020

Waiting for that day.

No need to wait. Now it's all up to you. We can have the lists of dhatus of Indian Sanskrit dictionaries thanks to @Shalu411 and @funderburkjim combined. I'm still dreaming of PWG and PWK.

@Shalu411
Copy link
Collaborator Author

Hariom, I have looked at the text file. It is in SLP format which is difficult for Devanagari people. Is there a converter which could directly convert the text into Dev?
One more thing is - we have one more hand for help.. He will join us in this. :)

@drdhaval2785
Copy link
Contributor

Some false positives can be removed by finding absence of 0 in the line.

@gasyoun
Copy link
Member

gasyoun commented Jan 20, 2020

Is there a converter which could directly convert the text into Dev?

Yeah, @drdhaval2785 is from the Nagari tribe ) Please help didi get out of SLP hell.

@funderburkjim
Copy link
Contributor

4 files now in gist

vcp verb filter #1 gist

@gasyoun
Copy link
Member

gasyoun commented Jan 21, 2020

@funderburkjim
Copy link
Contributor

Also Usha should examinethe 'nozero -deva' one --- As Dhaval noticed, these 'nozero' cases are probably NOT verbs, but Usha should examine, in case any verbs are hidden in this nozero list.

@Shalu411
Copy link
Collaborator Author

Hariom
Jim, Thanks a lot for this help. You do a wonderful job always.
I am very very happy to present the cleaned list of verbs. It was checked twice.. hope no issues remain.. still if one more eye is needed, please let me ask Dhaval.
It is in Devanagari- that Dhaval had shared.
I have put all the non-dhAtu entries in a separate text file as per Marcis' advice.
But one thing is I haven't looked for any dhAtus if left out in VCP.. as I did not know what method to employ.. If that checking is needed and I am guided, can do!
Looking forward for the next steps.
Thanks :)

VCP-Dhatus.docx
VCP-Non-Dhatu.txt

@gasyoun
Copy link
Member

gasyoun commented Feb 15, 2020

@Shalu411 79 non-dhatus, nice catch. @drdhaval2785 should we revert it to SLP1 or @funderburkjim can work with it as well?

@Shalu411
Copy link
Collaborator Author

Hariom.
Knock knock.. Is Jim in? Jim, are you ok?? Could you respond? I am all full of solving the second entries file.. Please come back on this..

@funderburkjim
Copy link
Contributor

funderburkjim commented Feb 18, 2020

@Shalu411 I'm fine. Just been involved with other things. Will take a look at what you've done soon!

I'll probably revert back to slp1. No big deal to do this.

After a first look at your two files, I note:

  • Your VCP-Dhatus.docx file has 2280 lines (after removing some blank lines)
    • Note: txt files are easier to work with. I converted your .docx file to .txt using
      Google docs. Request that in future you stick to .txt files.
  • Your VCP-Non-Dhatu.txt file has 79 lines
  • So that makes 2280 + 79 = 2359 total lines

This agrees with the 2361 lines (minus 2 comment lines) of the original list of verbs.

So I think that

  • the VCP-Dhatus file contains the 2280 verbs
  • the VCP-Non-Dhatu file contains the 79 from original list that you confirm to be non-verbs

Am I understanding the two files properly?

@gasyoun
Copy link
Member

gasyoun commented Feb 19, 2020

Am I understanding the two files properly?

Exactly.

VCP-Non-Dhatu file contains the 79 from original list that you confirm to be non-verbs

Yes.

@Shalu411
Copy link
Collaborator Author

Namaste Jim,
Wow.. Great to hear back!
Request that in future you stick to .txt files.
Sure! It was Marcis who asked me to work with MS-Word. And I thought I can highlight errors. Well, will next time work with only text.
Thanks a lot. Looking forward for more work :)

@funderburkjim
Copy link
Contributor

two more non-dhatus

@Shalu411 I think these two also should be in non-verbs. Agree?

फेणक [p= 4556]

फेण(न)क स्वार्थे क संज्ञायां कन् वा । १ फेनशब्दार्थे
२ पिष्टकभेदे त्रिका० । [ID=35190]
रोची [p= 4814]

रोची स्त्रौ रोचयति रुच्—णिच्—अच् गारा० ङीष् । हिल-
मोचिक्वायाम् शब्दर० । [ID=39830]

@funderburkjim
Copy link
Contributor

funderburkjim commented Feb 22, 2020

spelling changes

I made a few spelling changes in VCP-Dhatus.txt .
Details are here: sanskrit-lexicon/csl-orig#166.

@Shalu411 Do you agree with these changes?

@funderburkjim
Copy link
Contributor

all text for vcp-dhatus

A new report is generated, relating to the request in the initial comment of this issue.

There are two forms:

The report has, for each of the 2278 roots of VCP-Dhatus.txt,

  • a status line for the root
  • The full text for the root.

Here's the first entry:

;; Case 0001: L=4, k1=aMSa, k2=aMSa, #upasargas=0, mw=aMS (diff)
अंश¦ विभाजने अद० चु० उभ० । अंशयति ते आंशिशत् त ।
<>अङ्कापयतीतिवत् आपुकि अंशापयतीत्येके । अच् अंशः
<>उ--अंशुः णिनि अंशी क्त अंशितः ।
;----------------------------------------------------------------------
;

The status line has:

  • a sequence number (Case 0001)
  • The Cologne ID of the record (L=4)
  • the VCP headword spelling, in SLP1 (k1=aMSa)
  • The VCP 'full headword' spelling (key2), in SLP1. (k2=aMSa).
    • k2 is same as k1 except for 63 cases, such as
      ;; Case 0009: L=462, k1=aNka, k2=aNka(nka), #upasargas=0, mw=aNk (diff)
  • The number of upasargas found for the root (#upasargas=0) The pattern
    <HI>xx + was used to identify upasargas.
    • in the file, these are further identified by the pattern *<HI>...
    • The first one is in Case 68, k1=Apa
      *<HI>सम् + संपूर्ण्णतायां समाप्तः समाप्तिः समापनम् ।
    • There are 700 such upasarga lines, in 96 roots.
  • The MW spelling of the root, in slp1 spelling (mw=aMS (diff))
    • the '(diff)' is a note indicating that the MW spelling and the VCP spelling of the root are
      different. This occurs in 1971 case
    • a '(same)' note appears when the VCP and MW spellings are the same. This occurs in 259 cases.
    • mw=? indicates that no match to an MW verb has yet been found. This occurs in 48 cases.

@funderburkjim
Copy link
Contributor

funderburkjim commented Feb 22, 2020

Note on VCP-MW correspondences

The correspondences were derived by applying various rules to get an MW root spelling from the VCP root spelling.

In 1564 cases, the MW spelling is obtained by dropping the final 'a' from the VCP spelling.

I hope others will examine these VCP-MW verb spelling correspondences, and will

  • fill in some of the 48 cases where I could find no correspondence
  • look for errors in the correspondences
  • Think of ways, programmatic or manual, to gain more confidence in the correspondences.

@funderburkjim
Copy link
Contributor

@Shalu411 I think the preverb1 report gets at what you requested. Is there more you need from me regarding VCP roots?

@funderburkjim
Copy link
Contributor

@Shalu411

I got this message from you, but can't find it in this issue thread:

Namaste
Thanks Jim. Ah, I need the whole detail, I mean not just the headword, but the explanation-text associated with that. I thought once verbs are finalized, we could pull the whole item-data, I mean, the explanation-text out easily. Hope it's do-able.

Did you look at vcp_preverb1_deva.txt ??

It has the whole item-data for each verb.

@gasyoun
Copy link
Member

gasyoun commented Feb 23, 2020

Did you look at vcp_preverb1_deva.txt

She's looking at today.

A new report is generated, relating to the request in the initial comment of this issue.

What can I say - that's a level of depth at working and reporting I'll never achieve.

@Shalu411
Copy link
Collaborator Author

Ah, Mistake Jim. I had not looked at the whole matter and responded.. So I deleted the message I wrote. Today now I am looking at things. I think everything is in place now from what I could make by an initial look. Will come back on this very soon. :)

@Shalu411
Copy link
Collaborator Author

Oh, it's a great great work Jim. Thanks a lot -first. :)
Let me come one by one-
I think these two also should be in non-verbs. Agree? फेणक
Yes.. Of course. Left out by my eyes.. Caught by yours. Nice. :)

I made a few spelling changes in VCP-Dhatus.txt. Do you agree with these changes?
Yes. I do. Dhaval's response makes the work complete. Thanks Dhaval.

The report has, for each of the 2278 roots of VCP-Dhatus.txt,
I was surprised by the detail of the work as I gave an initial look.. So needed time to respond. I now understand all that is given. Of course, it's wonderful approach.. More than what I had wanted. It's excellent, I should say.

Is there more you need from me regarding VCP roots?
Of course no.. You made it all complete. :)

Mark,
What can I say - that's a level of depth at working and reporting I'll never achieve.
Add me in. :)

My side, it's awesome. 👍
@dhaval, Have you anything to say?

Lots of thanks again-
--Shalu

@funderburkjim
Copy link
Contributor

revision of vcp_preverb1 reports

There are minor improvements in the vcp_preverb1.txt and vcp_preverb1_deva.txt files (see above for links). The differences are that a small number of additional mw correspondences are in the
new files. The text parts are the same, except for a correction in first line for GiRRa root.

@gasyoun
Copy link
Member

gasyoun commented Feb 25, 2020

There are minor improvements in the vcp_preverb1.txt and vcp_preverb1_deva.txt files

But the roots are not marked still in the main dictionary XML file, right?

@funderburkjim
Copy link
Contributor

roots are not marked still in the main dictionary XML

Right. Currently, there is markup in MW of the roots.

Maybe after this current flurry of root identification in various other dictionaries is over, we can
consider adding verb markup in the digitization of other dictionaries.

@gasyoun
Copy link
Member

gasyoun commented Feb 26, 2020

after this current flurry of root identification in various other dictionaries is over

I guess it will take a few months. Would love to know the steps for any one dictionary (SKD, for example) in case the work gets stopped now, so it can be replicated at any later point.

adding verb markup in the digitization of other dictionaries

Yeah. If there is top5 whislist regarding Cologne, it sure tops it all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants