All dhatu entries #9

Shalu411 · 2020-01-16T19:43:44Z

Hariom, Greetings to all. Am back after very long time! Let me be happy only after I continue to be here.. not right now. :)
Have been working with dhAtus lately. Here is a request-

I need the whole list of all existent dhAtus, not just headwords but with inner explanatory text contents separately.. either each as a separate text file or whole as one file.. whichever is easy.
Text should be in Devanagari inevitably.
If files are chosen to kept separate for each dhAtu, for convenience sake, then names of files are expected in IAST i.e. Roman Diacritics.
Please do let me know if this is feasible and workable.
-Regards
Shalu

Shalu411 · 2020-01-16T19:54:09Z

The second request is-

If possible, with the upasargas marked exclusively- as a separate list, but within the dhAtu explanation content. In VCP the entry text content has plus + symbols for each.
For example- dA-dhAtu text is added. + appears 28 times. So 28 upasarga-combinations are taken by dA-dhAtu. The first upasarga is अति + and last is- प्रति +
I will need 'the upasarga and plus sign', at least, in bold.. or some kind of separate marker.. so that I could identify it amidst the long explanation stuff. ONLY if possible.
For eg. I tried to do first six.
(here is the dhatu with its tense and other forms)
Entry starts thus-- (I have put points to indicate the text running later)
दा दाने जुहो० उभ० सक० सेट् । ददाति दत्ते प्रणिददाति ।
दद्यात् ददीत ।..................................................................................
(Here start the upasargas)
अति + अतिक्रम्य दाने अत्यन्तदाने च । “न जीवन्तमति-
ददाति” कात्या० ४ । १ । २७ । “अतिद बलिर्बद्धः” चाण०
अनु + पञ्चाद्दाने तुल्यरूपदाने प्रतिनिधित्वेन च । “न
दूढ्ये अनुददासि वामम्” ऋ० १ । १९० । ५ । “सूराश्चदस्मा अनु
दादपस्याम्” ७ । ४५ । २ “यः शर्धते नानददाति शृध्याम् २ ।
१२ । २०
अभि + आभिमुख्येन दाने “तथैव चैनमुक्त्वा वामपार्ष्णिमभ्य-
दात्” भा० व० १९७ अ० गद्यम् ।
अव + अधोदाने अवत्तम् आदिकर्मणि तु वा तादेशः
यथाह सि० कौ० “अवदत्तं विदत्तञ्च प्रदत्तञ्चादिकर्मणि
[Page3511-b+ 38]

सुदत्तमनुदत्तञ्च निदत्तमिति वेष्यते” चशब्दाद्यथाप्राप्तम् ।
आ + ग्रहणे “दद्याच्चैवाददीत वा” । “शुभां विद्या
माददीतावरादपि” मनुः । “स्वं चादास्यामि भूयोऽह
पाप्मानं जरया सह” भा० आ० ८४ अ० । “शरीरमात्तं
मृत्युना” छा० उ० “आददानः परक्षेत्रात्” मनुः ।
अप + आ + अपेक्ष्य ग्रहणे । “मृत्पिण्डमपादाय महा-
वीरं करोति” शत० ब्रा० १४ । १ । २ । १७
उद् + आ + उदस्य ग्रहणे “उदादाय पृथिवीं जीवदानुम्”
यजु० १ । २८ ।

So sometimes two upasargas will be there- We need to bold that till second one.
Do please let me know how much workable is this? And what contribution is needed my side. :) Gladly waiting a reply.
-Shalu

funderburkjim · 2020-01-17T02:05:25Z

@Shalu411 Hi - long time no see!

The first step is to somehow find the entries of VCP that are verbs.

One approach is to use the first line (from the digitization vcp.txt), and look for one or more
patterns that indicate verbs.

In many cases, the first word in the entry for a verb is a word giving the sense (artha); and this
often (always?) seems to be a word in locative singular, such as (in slp1 spelling of vcp.txt):

aMSa¦ viBAjane ada0 cu0 uBa0 . aMSayati te AMSiSat ta .
akza¦ vyAptO saMhatO ca BvA0 pa0 vew . akzati akzRoti
arca¦ pUjAyAM curA0 uBa0 saka0 sew . arcayati te Arci-
arca¦ pUjAyAm uBa0 BvAdi0 saka0 sew . arcati te ArcIt

Doing a search of vcp.txt on these 4 patterns yields 2361 matches: https://gist.github.com/funderburkjim/be63c2a034770cf42f01b8e2955d9f9f

Take a look at this list.
My quick glance makes me think most are verbs. But you should carefully review the list and
look for 'false positives' (i.e., cases where a line is NOT a verb); these false positives (if any)
need to be removed.

Once the false positives are removed, we'll have a good starting point for a list of verbs in vcp.

The next question would be: are there any MISSING verbs (i.e., entries in vcp for verbs
that are not in the list). If you find any missing verbs, maybe there is another pattern (in addition
to those above) that we can use.
Eventually, we'll be able to have a list of entries that we are confident represent most, if not all,
of the verbs in vcp.

Once we've got this list of verbs, we can think more about how to present, for the verbs in this list, the information you are looking for.

gasyoun · 2020-01-17T09:59:51Z

2361 matches

Number looks promising, anything around 2200 should be.

carefully review the list and look for 'false positives'

This is what Usha is good about.

Shalu411 · 2020-01-18T16:28:14Z

Namaste. Wow! So glad to hear such a quick reply, Jim!
Sure, will check and be back.. Had taken an initial look however!

Once the false positives are removed, we'll have a good starting point for a list of verbs in vcp.
Sure! Waiting for that day.
Once VCP works out, we could plan Apte! :)

gasyoun · 2020-01-18T20:41:23Z

Waiting for that day.

No need to wait. Now it's all up to you. We can have the lists of dhatus of Indian Sanskrit dictionaries thanks to @Shalu411 and @funderburkjim combined. I'm still dreaming of PWG and PWK.

Shalu411 · 2020-01-20T05:29:03Z

Hariom, I have looked at the text file. It is in SLP format which is difficult for Devanagari people. Is there a converter which could directly convert the text into Dev?
One more thing is - we have one more hand for help.. He will join us in this. :)

drdhaval2785 · 2020-01-20T09:49:50Z

Some false positives can be removed by finding absence of 0 in the line.

gasyoun · 2020-01-20T15:14:28Z

Is there a converter which could directly convert the text into Dev?

Yeah, @drdhaval2785 is from the Nagari tribe ) Please help didi get out of SLP hell.

funderburkjim · 2020-01-21T19:30:00Z

4 files now in gist

vcp verb filter #1 gist

verb filter #1 -- same as before, but with the absence of 0 in line (likely false positives) removed
slp1 transliteration
verb filter #1 deva -- same as previous, text in Devanagari
verb filter #1 nozero -- the 66 cases with no '0' in line.
slp1 transliteration
verb filter #1 nozero deva Same as previous, text in Devanagari

gasyoun · 2020-01-21T21:13:57Z

So guess https://gist.github.com/funderburkjim/be63c2a034770cf42f01b8e2955d9f9f#file-vcp-verb-filter-1-deva is the one Usha was asking about.

funderburkjim · 2020-01-21T21:25:39Z

Also Usha should examinethe 'nozero -deva' one --- As Dhaval noticed, these 'nozero' cases are probably NOT verbs, but Usha should examine, in case any verbs are hidden in this nozero list.

Shalu411 · 2020-02-15T20:10:45Z

Hariom
Jim, Thanks a lot for this help. You do a wonderful job always.
I am very very happy to present the cleaned list of verbs. It was checked twice.. hope no issues remain.. still if one more eye is needed, please let me ask Dhaval.
It is in Devanagari- that Dhaval had shared.
I have put all the non-dhAtu entries in a separate text file as per Marcis' advice.
But one thing is I haven't looked for any dhAtus if left out in VCP.. as I did not know what method to employ.. If that checking is needed and I am guided, can do!
Looking forward for the next steps.
Thanks :)

VCP-Dhatus.docx
VCP-Non-Dhatu.txt

gasyoun · 2020-02-15T20:32:08Z

@Shalu411 79 non-dhatus, nice catch. @drdhaval2785 should we revert it to SLP1 or @funderburkjim can work with it as well?

Shalu411 · 2020-02-18T18:52:12Z

Hariom.
Knock knock.. Is Jim in? Jim, are you ok?? Could you respond? I am all full of solving the second entries file.. Please come back on this..

funderburkjim · 2020-02-18T19:54:12Z

@Shalu411 I'm fine. Just been involved with other things. Will take a look at what you've done soon!

I'll probably revert back to slp1. No big deal to do this.

After a first look at your two files, I note:

Your VCP-Dhatus.docx file has 2280 lines (after removing some blank lines)
- Note: txt files are easier to work with. I converted your .docx file to .txt using
  Google docs. Request that in future you stick to .txt files.
Your VCP-Non-Dhatu.txt file has 79 lines
So that makes 2280 + 79 = 2359 total lines

This agrees with the 2361 lines (minus 2 comment lines) of the original list of verbs.

So I think that

the VCP-Dhatus file contains the 2280 verbs
the VCP-Non-Dhatu file contains the 79 from original list that you confirm to be non-verbs

Am I understanding the two files properly?

gasyoun · 2020-02-19T05:35:47Z

Am I understanding the two files properly?

Exactly.

VCP-Non-Dhatu file contains the 79 from original list that you confirm to be non-verbs

Yes.

Shalu411 · 2020-02-19T14:23:16Z

Namaste Jim,
Wow.. Great to hear back!
Request that in future you stick to .txt files.
Sure! It was Marcis who asked me to work with MS-Word. And I thought I can highlight errors. Well, will next time work with only text.
Thanks a lot. Looking forward for more work :)

funderburkjim · 2020-02-22T21:06:10Z

two more non-dhatus

@Shalu411 I think these two also should be in non-verbs. Agree?

फेणक [p= 4556]

फेण(न)क स्वार्थे क संज्ञायां कन् वा । १ फेनशब्दार्थे
२ पिष्टकभेदे त्रिका० । [ID=35190]

रोची [p= 4814]

रोची स्त्रौ रोचयति रुच्—णिच्—अच् गारा० ङीष् । हिल-
मोचिक्वायाम् शब्दर० । [ID=39830]

funderburkjim · 2020-02-22T21:08:17Z

spelling changes

I made a few spelling changes in VCP-Dhatus.txt .
Details are here: sanskrit-lexicon/csl-orig#166.

@Shalu411 Do you agree with these changes?

funderburkjim · 2020-02-22T22:35:28Z

all text for vcp-dhatus

A new report is generated, relating to the request in the initial comment of this issue.

There are two forms:

vcp_preverb1.txt has the Sanskrit in slp1 spellings
vcp_preverb1_deva.txt has the spelling in Devanagari.

The report has, for each of the 2278 roots of VCP-Dhatus.txt,

a status line for the root
The full text for the root.

Here's the first entry:

;; Case 0001: L=4, k1=aMSa, k2=aMSa, #upasargas=0, mw=aMS (diff)
अंश¦ विभाजने अद० चु० उभ० । अंशयति ते आंशिशत् त ।
<>अङ्कापयतीतिवत् आपुकि अंशापयतीत्येके । अच् अंशः
<>उ--अंशुः णिनि अंशी क्त अंशितः ।
;----------------------------------------------------------------------
;

The status line has:

a sequence number (Case 0001)
The Cologne ID of the record (L=4)
the VCP headword spelling, in SLP1 (k1=aMSa)
The VCP 'full headword' spelling (key2), in SLP1. (k2=aMSa).
- k2 is same as k1 except for 63 cases, such as
  ;; Case 0009: L=462, k1=aNka, k2=aNka(nka), #upasargas=0, mw=aNk (diff)
The number of upasargas found for the root (#upasargas=0) The pattern
<HI>xx + was used to identify upasargas.
- in the file, these are further identified by the pattern *<HI>...
- The first one is in Case 68, k1=Apa
  *<HI>सम् + संपूर्ण्णतायां समाप्तः समाप्तिः समापनम् ।
- There are 700 such upasarga lines, in 96 roots.
The MW spelling of the root, in slp1 spelling (mw=aMS (diff))
- the '(diff)' is a note indicating that the MW spelling and the VCP spelling of the root are
  different. This occurs in 1971 case
- a '(same)' note appears when the VCP and MW spellings are the same. This occurs in 259 cases.
- mw=? indicates that no match to an MW verb has yet been found. This occurs in 48 cases.

funderburkjim · 2020-02-22T22:44:43Z

Note on VCP-MW correspondences

The correspondences were derived by applying various rules to get an MW root spelling from the VCP root spelling.

In 1564 cases, the MW spelling is obtained by dropping the final 'a' from the VCP spelling.

I hope others will examine these VCP-MW verb spelling correspondences, and will

fill in some of the 48 cases where I could find no correspondence
look for errors in the correspondences
Think of ways, programmatic or manual, to gain more confidence in the correspondences.

funderburkjim · 2020-02-22T22:45:54Z

@Shalu411 I think the preverb1 report gets at what you requested. Is there more you need from me regarding VCP roots?

funderburkjim · 2020-02-23T21:47:33Z

@Shalu411

I got this message from you, but can't find it in this issue thread:

Namaste
Thanks Jim. Ah, I need the whole detail, I mean not just the headword, but the explanation-text associated with that. I thought once verbs are finalized, we could pull the whole item-data, I mean, the explanation-text out easily. Hope it's do-able.

Did you look at vcp_preverb1_deva.txt ??

It has the whole item-data for each verb.

gasyoun · 2020-02-23T23:27:38Z

Did you look at vcp_preverb1_deva.txt

She's looking at today.

A new report is generated, relating to the request in the initial comment of this issue.

What can I say - that's a level of depth at working and reporting I'll never achieve.

Shalu411 · 2020-02-24T00:05:12Z

Ah, Mistake Jim. I had not looked at the whole matter and responded.. So I deleted the message I wrote. Today now I am looking at things. I think everything is in place now from what I could make by an initial look. Will come back on this very soon. :)

Shalu411 · 2020-02-24T00:34:37Z

Oh, it's a great great work Jim. Thanks a lot -first. :)
Let me come one by one-
I think these two also should be in non-verbs. Agree? फेणक
Yes.. Of course. Left out by my eyes.. Caught by yours. Nice. :)

I made a few spelling changes in VCP-Dhatus.txt. Do you agree with these changes?
Yes. I do. Dhaval's response makes the work complete. Thanks Dhaval.

The report has, for each of the 2278 roots of VCP-Dhatus.txt,
I was surprised by the detail of the work as I gave an initial look.. So needed time to respond. I now understand all that is given. Of course, it's wonderful approach.. More than what I had wanted. It's excellent, I should say.

Is there more you need from me regarding VCP roots?
Of course no.. You made it all complete. :)

Mark,
What can I say - that's a level of depth at working and reporting I'll never achieve.
Add me in. :)

My side, it's awesome. 👍
@dhaval, Have you anything to say?

Lots of thanks again-
--Shalu

funderburkjim · 2020-02-25T02:04:15Z

revision of vcp_preverb1 reports

There are minor improvements in the vcp_preverb1.txt and vcp_preverb1_deva.txt files (see above for links). The differences are that a small number of additional mw correspondences are in the
new files. The text parts are the same, except for a correction in first line for GiRRa root.

gasyoun · 2020-02-25T08:06:24Z

There are minor improvements in the vcp_preverb1.txt and vcp_preverb1_deva.txt files

But the roots are not marked still in the main dictionary XML file, right?

funderburkjim · 2020-02-26T00:28:50Z

roots are not marked still in the main dictionary XML

Right. Currently, there is markup in MW of the roots.

Maybe after this current flurry of root identification in various other dictionaries is over, we can
consider adding verb markup in the digitization of other dictionaries.

gasyoun · 2020-02-26T07:39:57Z

after this current flurry of root identification in various other dictionaries is over

I guess it will take a few months. Would love to know the steps for any one dictionary (SKD, for example) in case the work gets stopped now, so it can be replicated at any later point.

adding verb markup in the digitization of other dictionaries

Yeah. If there is top5 whislist regarding Cologne, it sure tops it all.

gasyoun added the enhancement label Jan 17, 2020

funderburkjim mentioned this issue Mar 1, 2020

All Dhatu Entries: verbs01 #10

Open

drdhaval2785 mentioned this issue Dec 20, 2020

todo list in 2021 (in descending order of importance) sanskrit-lexicon/COLOGNE#325

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All dhatu entries #9

All dhatu entries #9

Shalu411 commented Jan 16, 2020

Shalu411 commented Jan 16, 2020 •

edited

Loading

funderburkjim commented Jan 17, 2020

gasyoun commented Jan 17, 2020

Shalu411 commented Jan 18, 2020

gasyoun commented Jan 18, 2020

Shalu411 commented Jan 20, 2020

drdhaval2785 commented Jan 20, 2020

gasyoun commented Jan 20, 2020

funderburkjim commented Jan 21, 2020

gasyoun commented Jan 21, 2020

funderburkjim commented Jan 21, 2020

Shalu411 commented Feb 15, 2020

gasyoun commented Feb 15, 2020

Shalu411 commented Feb 18, 2020

funderburkjim commented Feb 18, 2020 •

edited

Loading

gasyoun commented Feb 19, 2020

Shalu411 commented Feb 19, 2020

funderburkjim commented Feb 22, 2020

funderburkjim commented Feb 22, 2020 •

edited

Loading

funderburkjim commented Feb 22, 2020

funderburkjim commented Feb 22, 2020 •

edited

Loading

funderburkjim commented Feb 22, 2020

funderburkjim commented Feb 23, 2020

gasyoun commented Feb 23, 2020

Shalu411 commented Feb 24, 2020

Shalu411 commented Feb 24, 2020

funderburkjim commented Feb 25, 2020

gasyoun commented Feb 25, 2020

funderburkjim commented Feb 26, 2020

gasyoun commented Feb 26, 2020

All dhatu entries #9

All dhatu entries #9

Comments

Shalu411 commented Jan 16, 2020

Shalu411 commented Jan 16, 2020 • edited Loading

funderburkjim commented Jan 17, 2020

gasyoun commented Jan 17, 2020

Shalu411 commented Jan 18, 2020

gasyoun commented Jan 18, 2020

Shalu411 commented Jan 20, 2020

drdhaval2785 commented Jan 20, 2020

gasyoun commented Jan 20, 2020

funderburkjim commented Jan 21, 2020

4 files now in gist

gasyoun commented Jan 21, 2020

funderburkjim commented Jan 21, 2020

Shalu411 commented Feb 15, 2020

gasyoun commented Feb 15, 2020

Shalu411 commented Feb 18, 2020

funderburkjim commented Feb 18, 2020 • edited Loading

gasyoun commented Feb 19, 2020

Shalu411 commented Feb 19, 2020

funderburkjim commented Feb 22, 2020

two more non-dhatus

funderburkjim commented Feb 22, 2020 • edited Loading

spelling changes

funderburkjim commented Feb 22, 2020

all text for vcp-dhatus

funderburkjim commented Feb 22, 2020 • edited Loading

Note on VCP-MW correspondences

funderburkjim commented Feb 22, 2020

funderburkjim commented Feb 23, 2020

gasyoun commented Feb 23, 2020

Shalu411 commented Feb 24, 2020

Shalu411 commented Feb 24, 2020

funderburkjim commented Feb 25, 2020

revision of vcp_preverb1 reports

gasyoun commented Feb 25, 2020

funderburkjim commented Feb 26, 2020

gasyoun commented Feb 26, 2020

Shalu411 commented Jan 16, 2020 •

edited

Loading

funderburkjim commented Feb 18, 2020 •

edited

Loading

funderburkjim commented Feb 22, 2020 •

edited

Loading

funderburkjim commented Feb 22, 2020 •

edited

Loading