pUrbb vs. pUrvv #20

funderburkjim · 2021-03-18T21:50:58Z

This is one of those 'b/v' problems that @drdhaval2785 loves.

In vcp2, there are 1303 matches for `pUrvv' and 1474 matches for 'pUrbb'.

In vac2, there are 2051 matches for 'pUrvv' and 236 matches for 'pUrbb'.

I suggest we change all the 'pUrbb' to 'pUrvv' in both vcp and vac.

This would remove a nice chunk of needless differences.

What do others think?

funderburkjim · 2021-03-18T22:30:43Z

TE lines beginning with anusvara

There are 852 lines of vac2 that begin with 'M', the slp1 version of anusvAra.
None for vcp2.

Suggest we remove those initial 'M' in vac2.

funderburkjim · 2021-03-18T22:34:13Z

Another interesting stat,
There are about 100,000 lines whose adjusted text differs by only 1 edit in the two versions.
First current Example:

000007:1: vcp: <>patnI NIp I (lakzmIH) .
000007:1:  te: patnI +NIp+I (lakzmI). </vkr><page>vp1_039.pdf</page><column>1</column><br/>

gasyoun · 2021-03-18T22:58:33Z

I suggest we change all the 'pUrbb' to 'pUrvv' in both vcp and vac.

This will get us in trouble with @drdhaval2785

There are about 100,000 lines whose adjusted text differs by only 1 edit in the two versions.

So dirt takes as visarga, now that is a huge number. Maybe forget about the idea of correcting it?

drdhaval2785 · 2021-03-18T23:32:37Z

I agree on changing all pUrbb to pUrvv.
Can you provide a sample where there is M initially, with preceding line for reference?
In the example lakzmIH, visarga is mandatory according to grammar. See SKD, which gives headwords in their inflected forms. You will not find lakzmI, but only lakzmIH. So these single difference lines are actual differences. Let them be examined thoroughly. We can not brush them under the carpet with regex.

funderburkjim · 2021-03-19T01:01:18Z

a sample with M initially ...

Here's the whole list of 852: filter01.txt

funderburkjim · 2021-03-19T01:07:38Z

Maybe forget about the idea of correcting it?

The point about that 100,000 is that maybe there is a way to 'automate' some significant
fraction of those differences (where two texts differ in only 1 position).

The lakzmI/lakzmIH example may be idiosyncratic, but perhaps some of the differences
are systematic in the sense that some rule or rules could be used to identify that one of
the spellings is right and the other one is wrong.

funderburkjim · 2021-03-19T01:28:55Z

candrabindu

In the slp1 coding of vac.txt, candrabindu is represented by the character '~' ; I believe this
is the usual slp1 convention.

However, ~ is not used in vac2 (Tirupati); instead, the candrabindu is represented by 'z'; this
is clearly different from the SLP1 convention that 'z' represents cerebral sibilant; and
vac2 does use 'z' also for the cerebral sibilant.

Thus we should correct vac2 in such cases (changing such 'z' to '~').

There are 842 matches for ~ in vcp.txt.

drdhaval2785 · 2021-03-19T02:19:22Z

a sample with M initially .

A bird's eye view shows that VAC is correct in majority of places. cInA-MSuka is correct. cInA-Suka is wrong ib VCP.

So, we can not mechanically change VAC. On the contrary, VCP would require addition of those missing anusvAras.

drdhaval2785 · 2021-03-19T02:21:11Z

candrabindu

I agree. There is no possibility of ष being confused with candrabindu by any typist. So, we can mechanically convert z to ~ where vcp.txt has ~.

drdhaval2785 · 2021-03-19T02:21:53Z

I love the way Jim keeps on identifying low hanging fruits, to reduce labour.

drdhaval2785 · 2021-03-19T02:24:16Z

NYRnm v/s M

I am not sure about the conventions used by VAC and VCP. But I saw some entries in meld, which were differing in these letters only. E.g. saMKyA and saNKyA.

We can derive some stats to check the tendency of the dictionary, and correct the remaining entries to match them.

drdhaval2785 · 2021-03-19T02:26:38Z

duplicated / deduplicated

Check for stats in VCP of rxx and rx. e.g. karmma and karma . We can align both vac.txt and vcp.txt to the more prevalent convention. That would reduce unnecessary meld diffs.

The relevant portion from paper normalization.pdf is as below.

Convention 2 - Duplication of consonants after ’r’.

Option 2.1
Duplication is done in all cases e.g. पूर्व्व.
Dictionaries: SKD, WIL

Option 2.2
Duplication is not done e.g. पूर्व .
Dictionaries: ACC, AP, AP90, BEN, BHS, BOP, BUR, CAE, CCS, GRA, GST, IEG, INM, KRM, MCI, MD, MW, MW72, PD, PE, PGN, PUI, PW, PWG, SCH, SNP, STC, VCP, VEI

Note– (1) SHS and YAT are inconsistent in this convention. See निर्विघ्न / निर्व्विकल्प in SHS and  दुर्वच / दुर्व्वचस् in YAT. (2) VCP highly leans towards option 2.2, but there are a few inconsistent entries as well e.g. पर्वत and अग्निपर्व्वत.

Therefore, we should remove all duplications after r in VCP and VAC.

gasyoun · 2021-03-19T06:32:16Z

I love the way Jim keeps on identifying low hanging fruits, to reduce labour.

I guess one could call it lexicographical hell otherwise.

We can derive some stats to check the tendency of the dictionary, and correct the remaining entries to match them.

Right, the nasals.

The relevant portion from paper normalization.pdf is as below.

What other issues of normalization.pdf should be applied to inside the dictionaries?

funderburkjim · 2021-03-20T20:55:20Z

षार्वत्यां -> पार्वत्यां was noticed by a user correction (sanskrit-lexicon/csl-orig#495).

There are several other zArvat and zArvvat possible errors in VCP to be investigated.

gasyoun · 2021-03-21T00:32:53Z

several other zArvat and zArvvat possible errors

I would propose that the issue is even wider: षा vs. पा

funderburkjim · 2021-03-22T01:07:16Z

Before we make a blanket change of 'pUrbba' to 'pUrvva' , I would like to know that the scanned images actually have 'bb' -- Can anyone find 5 instances where 'pUrbba' is clearly 'b' ?
The few examples that I've seen are not clearly 'b'. So I'm not sure whether the digitizations 'pUrbba' are actually a feature of the Author's spelling, or whether they are a feature of the digitization of unclear print images.

Andhrabharati · 2021-03-22T05:07:11Z

Guess the following (from the very beginning pages) are enough for the purpose-

<L>44 <pc>0037,b अकडम

<L>76 <pc>0039,b अकाल

<L>151 <pc>0044,b अक्षरन्यास

<L>174 <pc>0045,b अक्षि

<L>181 <pc>0046,a अक्षिभ्रुव

Andhrabharati · 2021-03-22T05:24:28Z

My remark elsewhere is not just limited to this पूर्ब्ब, but to

Therefore, we should remove all duplications after r in VCP and VAC.

as well.

The Eastern school (of India) of grammars (and usage) are having those throughout the literature in (& from) that region.
Probably it all is due to the Mugdhabodha influence.

One may look at the <L>10577 <pc>1458,b ॡ
where Taranatha specifically talks about the लकारद्वय as per मुग्धबोध.
The HW itself is shown as ल्लृ (instead of ॡ as is the practice elsewhere).
[Probably we could find the वर्णद्वययुत रेफ also somewhere mentioned inside the मुग्धबोध.]

Andhrabharati · 2021-03-22T05:32:35Z

What other issues of normalization.pdf should be applied to inside the dictionaries?

My opinion is that any kind of normalisation should be done in another layer (for searching and displaying etc.), but not in the actual "content" of the printed matter.

gasyoun · 2021-03-22T09:19:05Z

Probably it all is due to the Mugdhabodha influence.

Interesting thought.

My opinion is that any kind of normalisation should be done in another layer (for searching and displaying etc.), but not in the actual "content" of the printed matter.

We have some data in tags added, that's all for now, I guess.

Andhrabharati · 2021-08-08T16:02:31Z

I have some additional information now, and thought I should share the same here.

(a) The consonant doubling is not prescribed by Mugdhabodha (Vopadeva), but has been identified by Pāṇini himself.
He has framed a sūtra (अचो रहाभ्यां द्वे P. 8.4.46) saying the consonants after r and h can be optionally doubled. It is his method of covering all the regional practices known at his time.

So the replacement of the double consonant after r and h with a single consonant can be taken as grammatically alright.
But I would still suggest retaining the regional variant forms as seen the books, but have the non-doubled form as the alt. form for all such cases. This makes the searching to catch the words without fail and match with other dictionaries as well.

(b) Now coming to the perpetual ba/va issue.
Seen that Bengali script has no separate character for ba and va (both are represented by a single character ব, u+09AC); but Rev. Yates in his Bengali Grammar says thus-

With this information, we can safely replace the conjuncted b with v [for handling the doubling cases, refer the above point] in all the Bengal based works (WIL, YAT, SKD, VCP etc.), when such v forms are seen in other regional texts (like AP or the European ones).

Andhrabharati · 2021-08-09T02:00:39Z

The issue is still lingering in my mind.

Probably we can do the va/ba replacement (and the reverse case, ba/va as well) in non-conjuct places too; say like klIva to klIba, if such are the forms used in other region texts.

On the whole, it appears to be not a deliberate different form in Bengali works but just a limitation in their orthographs. And then the outsiders took the letters as is without understanding/knowing the Bengali limitation.

This thus treats the va-ba issue in toto once for all, I believe.

what do you say, @drdhaval2785?

drdhaval2785 · 2021-08-09T02:20:09Z

This was discussed elsewhere.
I was against changing b/v then.
Thereafter a strong argument was put forth.

For Bengal, there is no difference of b/v. So, they would not notice the change.
For rest of world, changing klIva to klIba woulx make the text more congruent to their expectation.

So there is nothing to be lost, but everything to gain by this b/v change.

I am now convinced that we should make changes, and am making such changes in my VAC VCP comparision work.

Andhrabharati · 2021-08-09T02:30:35Z

Good, and now I also have to take back what I said few months back that I cannot be a part in the team's exercise with the change suggested by Jim or you (in one of the issues in Meld usage).

After you finish your comparision work, I would be glad to proofread the VCP text, for the benefit of everyone.

Andhrabharati · 2021-08-09T13:09:31Z

As I am looking into SKD front pages now, found this piece of info under the section ग्रन्थपरिपाटी (Methodology adopted)-

वर्णमालायां च वर्ग्य-जकारान्तःस्थयकारौ मूर्द्धन्य-णकार-दन्त्यनकारौ वर्ग्यवकारान्तःस्थवकारौ तालव्यशकार-मूर्द्धन्यषकारदन्त्यसकाराः सन्ति ।

एतदखिल-वर्णादि-शब्दानां धातूनाञ्च प्रभेदं कृत्वा सूचीपूर्व्वकं यथास्थानं संस्थापनं कृतवान् । वङ्गदेशे उक्तवर्णानामुच्चारण-भेदाभावः । विशेषतो वकारद्वयस्याकारोच्चारणयोर्भेदो नास्ति पश्चिमादिदेशे वर्त्तते । किन्तु मुग्धबोधटीकायां दुर्गादासविद्यावागीशधृता वकार-भेदिकैकप्राचीनकारिकास्ति । सा यथा, --
“उदूटौ यत्र विद्येते यो वः प्रत्ययसन्धिजः । अन्तःस्थं तं विजानीयात्तदन्यो वर्ग्य उच्यते” ॥
एतत्कारिकया सकलवकार-प्रभेदो न भवति । इति हेतोरहं धातूनां शब्दाकरत्वादोष्ठ्यदन्त्य-वकारादिधातुद्वारा पदसाधनं कृत्वा बहु-प्रयत्नैर्वकारद्वय-भेदं प्रकाशितवान् रेफयुक्तवर्णन्तु रवर्णात् परं शब्द-सूचीमध्ये विन्यस्तवान् ॥

Here, we are cautioned not to change every va/ba-kAra blindly (एतत्कारिकया सकलवकार-प्रभेदो न भवति).

BTW, contextually the वर्ग्यवकारान्तःस्थवकारौ in the above text should not be changed to वर्ग्यबकारान्तःस्थवकारौ (as this has been referred a few lines later as वकारद्वय-भेदं), though there is no "vargya-va" in the rest of India.

Andhrabharati · 2021-08-09T13:27:27Z

Here SKD is giving the prevalent practice in Bengal that (j,y), (N,n), (b,v) and (S,z,s) groups [वर्ग्य-जकारान्तःस्थयकारौ मूर्द्धन्य-णकार-दन्त्यनकारौ वर्ग्यवकारान्तःस्थवकारौ तालव्यशकार-मूर्द्धन्यषकारदन्त्यसकाराः] to be without a difference in pronunciation.

Wilson in his dictionary (1st ed., 1819) preface quotes thus-

रलयोर्डलयोस्तद्वज्जययोर्बवयोरपि ।
शसयोर्मनयोश्चान्ते सविसर्गाविसर्गयोः ।
सविन्दुकाविन्दुकयोः स्यादभेदे न कल्पनम् ॥
(ralayor ḍalayos tadvajjayayor bavayorapi |
śasayor manayoś cānte savisargāvisargayoḥ |
savindukāvindukayoḥ syādabhede na kalpanam ||)

“The letters R and L, D and L, J and Y, B and V, Ś and S, M and N, a final visarga or its omission, and a final nasal mark or its omission, are always optional, there being no difference between them.”

Thus Wilson has covered a larger regional variations in India, than SKD.

Thought @funderburkjim might catch a piece or two (with his interest in "knowing" Skt.) through my posts, which could be of some help in cleaning the CSL texts.
[BTW is it CSL or CDSL that is to be used, to refer to this lexicon project of Cologne? I see both acronyms, somewhere or other on this forum and at the site.]

Andhrabharati · 2021-08-10T17:43:34Z

@drdhaval2785,

With the above information before us, what is your opinion about changing कोष to कोश?

This has been a long pending issue in my mind.
[Now looks like the reason is found.]

drdhaval2785 · 2021-08-10T23:51:38Z

As both are valid words, I would not change कोष or कोश

funderburkjim · 2021-09-18T20:23:39Z

May this issue be closed?

gasyoun added the bug label Mar 21, 2021

drdhaval2785 added a commit to sanskrit-lexicon/csl-orig that referenced this issue Aug 11, 2021

https://github.com/sanskrit-lexicon/VCP/issues/20

d9a3807

drdhaval2785 added a commit that referenced this issue Aug 11, 2021

#20 and some merger script

d52dc66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pUrbb vs. pUrvv #20

pUrbb vs. pUrvv #20

funderburkjim commented Mar 18, 2021

funderburkjim commented Mar 18, 2021

funderburkjim commented Mar 18, 2021 •

edited

Loading

gasyoun commented Mar 18, 2021

drdhaval2785 commented Mar 18, 2021

funderburkjim commented Mar 19, 2021

funderburkjim commented Mar 19, 2021

funderburkjim commented Mar 19, 2021

drdhaval2785 commented Mar 19, 2021

drdhaval2785 commented Mar 19, 2021

drdhaval2785 commented Mar 19, 2021

drdhaval2785 commented Mar 19, 2021

drdhaval2785 commented Mar 19, 2021 •

edited

Loading

gasyoun commented Mar 19, 2021

funderburkjim commented Mar 20, 2021

gasyoun commented Mar 21, 2021

funderburkjim commented Mar 22, 2021

Andhrabharati commented Mar 22, 2021 •

edited

Loading

Andhrabharati commented Mar 22, 2021 •

edited

Loading

Andhrabharati commented Mar 22, 2021

gasyoun commented Mar 22, 2021

Andhrabharati commented Aug 8, 2021 •

edited

Loading

Andhrabharati commented Aug 9, 2021 •

edited

Loading

drdhaval2785 commented Aug 9, 2021

Andhrabharati commented Aug 9, 2021

Andhrabharati commented Aug 9, 2021 •

edited

Loading

Andhrabharati commented Aug 9, 2021 •

edited

Loading

Andhrabharati commented Aug 10, 2021

drdhaval2785 commented Aug 10, 2021

funderburkjim commented Sep 18, 2021

pUrbb vs. pUrvv #20

pUrbb vs. pUrvv #20

Comments

funderburkjim commented Mar 18, 2021

funderburkjim commented Mar 18, 2021

TE lines beginning with anusvara

funderburkjim commented Mar 18, 2021 • edited Loading

gasyoun commented Mar 18, 2021

drdhaval2785 commented Mar 18, 2021

funderburkjim commented Mar 19, 2021

funderburkjim commented Mar 19, 2021

funderburkjim commented Mar 19, 2021

candrabindu

drdhaval2785 commented Mar 19, 2021

drdhaval2785 commented Mar 19, 2021

drdhaval2785 commented Mar 19, 2021

drdhaval2785 commented Mar 19, 2021

NYRnm v/s M

drdhaval2785 commented Mar 19, 2021 • edited Loading

duplicated / deduplicated

gasyoun commented Mar 19, 2021

funderburkjim commented Mar 20, 2021

gasyoun commented Mar 21, 2021

funderburkjim commented Mar 22, 2021

Andhrabharati commented Mar 22, 2021 • edited Loading

Andhrabharati commented Mar 22, 2021 • edited Loading

Andhrabharati commented Mar 22, 2021

gasyoun commented Mar 22, 2021

Andhrabharati commented Aug 8, 2021 • edited Loading

Andhrabharati commented Aug 9, 2021 • edited Loading

drdhaval2785 commented Aug 9, 2021

Andhrabharati commented Aug 9, 2021

Andhrabharati commented Aug 9, 2021 • edited Loading

Andhrabharati commented Aug 9, 2021 • edited Loading

Andhrabharati commented Aug 10, 2021

drdhaval2785 commented Aug 10, 2021

funderburkjim commented Sep 18, 2021

funderburkjim commented Mar 18, 2021 •

edited

Loading

drdhaval2785 commented Mar 19, 2021 •

edited

Loading

Andhrabharati commented Mar 22, 2021 •

edited

Loading

Andhrabharati commented Mar 22, 2021 •

edited

Loading

Andhrabharati commented Aug 8, 2021 •

edited

Loading

Andhrabharati commented Aug 9, 2021 •

edited

Loading

Andhrabharati commented Aug 9, 2021 •

edited

Loading

Andhrabharati commented Aug 9, 2021 •

edited

Loading