abbreviation preparation #11

funderburkjim · 2020-10-28T21:20:18Z

Via Email, a user, James, expressed an interest in identifying the abbreviations in Vacaspatyam dictionary.

In part, he said:

One thing we could try, and would probably be fairly fruitful.  If from your side you could create a 
list of the abbreviations, then I could see if I can crowdsource the names of the referenced texts  
(the expansions you mention) through our Indology list. 
There's an amazing amount of knowhow on the list, one is continually surprised 
by the depth and breadth of responses to really arcane questions.

This issue devoted to getting started with this.

The basic problem is that there is no known list of abbreviation expansions for VCP.

The text was updated successfully, but these errors were encountered:

funderburkjim · 2020-10-28T21:32:04Z

Identification of abbreviations in VCP

Abbreviations in VCP may be identified as words ending in the digit '0'.

There are about 4000 distinct such abbreviations in our digitization vcp.txt.

Grammar and Literary Source

Examination of a few examples leads me to think that there are basically 2 types of abbreviations:

Grammatical For example 'BvA0' for 'BvAdi' -- The 'class 1' verbs

Mostly, I think, these occur on the first couple of lines of an entry.

For example, with root 'gama', the first two lines are

गम गतौ भ्वा० पर० अनिट् । गच्छति ऌदित् अगमत् ज-
गाम जग्मतुः । गन्ता गम्यात् गमिष्यति । गन्ता गमी

Literary sources. These usually occur later in an entry (after the first two lines)

The 'first-two-lines' rule is just a rough preliminary indication of whether a given abbreviation should
be thought of as grammatical or as a literary source.

funderburkjim · 2020-10-28T21:41:07Z

a first display

Please see abbrev0_roman_100.md.

The file name indicates certain details of the display:

The Sanskrit words are represented in Roman Unicode (IAST)
- Other forms could use Devanagari or SLP1
The file and its links are written in markdown
- another option would be html
Only abbreviations with at least 100 observed instances are included. There are 154 such abbreviations
- There are 3900+ distinct abbreviations (at least 1 observed instance)
- There are 500+ distinct abbreviations with 10 or more instances.

154 instances is a workable number. If we make progress on these most common abbreviations, then
we can later tackle other less common abbreviations.

funderburkjim · 2020-10-28T21:45:18Z

structure of first display

The display is a table whose columns are:

a sequence number
the abbreviation
the number of occurrences of the abbreviation in the first 2 lines of some entry
the number of occurrences of the abbreviation not in the first 2 lines of some entry
Up to 5 headword links where the abbreviation occurs in first 2 lines
Up to 5 headword links where the abbreviation occurs after the first 2 lines

The links show one of the Cologne displays for the word in VCP dictionary.

funderburkjim · 2020-10-28T21:47:04Z

I've communicated back by email to James. We'll have to see what else might be needed to
facilitate crowd-sourcing.

gasyoun · 2020-10-30T18:56:55Z

The basic problem is that there is no known list of abbreviation expansions for VCP.

There are some hints. Remember the Tirupati edition files, it had some literary sources? abbrev0_roman_100.md so very well done.

154 instances is a workable number. If we make progress on these most common abbreviations, then
we can later tackle other less common abbreviations.

Agree.

funderburkjim · 2020-10-30T19:27:06Z

As a reminder, vac.txt is our copy of the Tirupati edition.

First glance at vac.txt shows markup with tags such as

vkr : vikrama = grammatical information

It would be useful to

get a list of all the tags used in Tirupati markup,
estimates of what the tags stand for (like 'vkr' stands for 'vikrama') and
perhaps how the tags could be made use of.
- Need help from a Sanskrit grammarian here.

I don't see definite markup of the abbreviations in vac.txt.

Andhrabharati · 2021-03-15T15:39:14Z

First glance at vac.txt shows markup with tags such as
* vkr  :  vikrama  = grammatical information

<vkr> is not for विक्रम; it is for व्याकरण.

Andhrabharati · 2021-03-15T15:48:34Z

I've communicated back by email to James. We'll have to see what else might be needed to
facilitate crowd-sourcing.

Any result from this crowd-sourcing?

More often than not, our observation is that nothing get started, once the work is "allotted".
----------------

Only abbreviations with at least 100 observed instances are included. There are 154 such abbreviations
* There are 3900+ distinct abbreviations  (at least 1 observed instance)

* There are 500+ distinct abbreviations with 10 or more instances.

May I ask @funderburkjim to post the complete list here (preferably in Devanagari)?

I tried making one such list myself.
VCP abbreviations extracted.txt

1. This shows that many entries are with spelling and spacing errors (I did remove some of them, but then stopped).
2. Also quite many of these could be clubbed together as comp. abbr.s, instead of keeping as separate ones.
3. Many are variant forms of the same "source".
4. And finally quite many others are without the trailing '0', either in the text or in the print itself.

Andhrabharati · 2021-03-15T15:51:39Z

@gasyoun
If you can trace your Tirupati CD, can you post a link to download it?
I also purchased the CD, but need to locate it.

funderburkjim · 2021-03-15T20:12:27Z

post the complete list here (preferably in Devanagari)?

A complete devanagari list (as github markdown table) is in two parts:

part1
part2

The one-part form is also prepared, but is too big for github to display properly.

There is also a simpler list, with each abbreviation and its frequency,
at
abbrev0_deva_all.txt.

This should be comparable to VCP.abbreviations.extracted.txt from @Andhrabharati (see a previous comment for link).

gasyoun · 2021-03-15T20:47:42Z

I don't see definite markup of the abbreviations in vac.txt

Tirupati people are famous for bad documenting, so it's the same with the Tirupati edition of digital Ramayana.

If you can trace your Tirupati CD, can you post a link to download it?

The file we have at Cologne is based on the CD Usha sent me. That is, nothing else about it.

Analysis done in 2014.

f_WX.txt
Vacaspatyam_15_01_2014_b1.xlsx
Vachaspatyam.xlsx
Vachaspatyam_b3_with_dev.xlsx
Vachaspatyam_b4_without_dev.xlsx
Vachaspatyam_b5_proof_1673.xlsx
Vachaspatyam_b6_proof_1673-06-01-14.xlsx

funderburkjim · 2021-03-16T02:42:54Z

The file we have at Cologne is based on the CD Usha sent me

The Tirupati vacaspatyam I started with in this repository is vac_input.txt. According to my notes in the readme.org of vcpte-vac,

By some unknown process, Scharf and colleagues reformatted and modified
presumably the same original Tirupati edition of Vacaspatyam.

gasyoun · 2021-03-16T06:12:21Z

Scharf and colleagues reformatted and modified presumably the same original Tirupati edition of Vacaspatyam.

Oh, so you believe the two versions have the same source initially.

funderburkjim · 2021-03-16T21:12:40Z

The Tirupati version I got from Scharf had already been put into SLP1. I don't know what source Peter started with; but since you mentioned the existence of a CD made by Tirupati, it may be that Peter started from that cd.

gasyoun · 2021-03-16T21:43:29Z

Tirupati version I got from Scharf had already been put into SLP1

Oh, ok, because when I saw the CD it was in that funny WX encoding. And contained not only the dictionary file, but several additional, including the Preface.

Andhrabharati · 2021-04-21T15:14:32Z

Finally done with the first phase of Vacaspatyam corrections, starting mainly with the abbr. markers, in a focused effort for two weeks; and the summary is in the file below-

Phase-1 of work on Vacaspatyam.txt

Almost all the abbr.s are resolved now!!

It appears that the present Cologne data has missed the dual/variant forms of the HWs (marked in parenthesis in the print), and also many errors are noticed.

Hence it is desirable to correct the HWs portion (before touching the body portion), which I would like to take up in next few days.

gasyoun · 2021-04-21T18:54:06Z

HWs portion (before touching the body portion)

Yeah, headwords is what comes first. Thanks for the hard work @Andhrabharati

Almost all the abbr.s are resolved now!!

But where to look for them?

funderburkjim · 2021-04-22T02:15:13Z

@Andhrabharati There are a lot of 'extra' headwords at
https://github.com/sanskrit-lexicon/csl-orig/blob/master/v02/vcp/vcp_hwextra.txt

These probably include many of the 'dual/variant forms' .

Andhrabharati · 2021-04-22T09:13:45Z

Yes @funderburkjim, I've seen this file as well as the Vachaspatyam-Doubles-16.3.15.xlsx file from @gasyoun.

As I saw, there are some errors in both the files.

So decided to do it again myself, while looking for HW errors throughout.

funderburkjim added a commit that referenced this issue Oct 28, 2020

VCP abbreviation preparation #11

62e6d51

drdhaval2785 added the enhancement label Dec 13, 2020

drdhaval2785 mentioned this issue Dec 20, 2020

todo list in 2021 (in descending order of importance) sanskrit-lexicon/COLOGNE#325

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

abbreviation preparation #11

abbreviation preparation #11

funderburkjim commented Oct 28, 2020 •

edited

Loading

funderburkjim commented Oct 28, 2020

funderburkjim commented Oct 28, 2020

funderburkjim commented Oct 28, 2020

funderburkjim commented Oct 28, 2020

gasyoun commented Oct 30, 2020

funderburkjim commented Oct 30, 2020

Andhrabharati commented Mar 15, 2021 •

edited

Loading

Andhrabharati commented Mar 15, 2021 •

edited

Loading

Andhrabharati commented Mar 15, 2021

funderburkjim commented Mar 15, 2021

gasyoun commented Mar 15, 2021

funderburkjim commented Mar 16, 2021 •

edited

Loading

gasyoun commented Mar 16, 2021

funderburkjim commented Mar 16, 2021

gasyoun commented Mar 16, 2021

Andhrabharati commented Apr 21, 2021

gasyoun commented Apr 21, 2021

funderburkjim commented Apr 22, 2021

Andhrabharati commented Apr 22, 2021

abbreviation preparation #11

abbreviation preparation #11

Comments

funderburkjim commented Oct 28, 2020 • edited Loading

funderburkjim commented Oct 28, 2020

Identification of abbreviations in VCP

Grammar and Literary Source

funderburkjim commented Oct 28, 2020

a first display

funderburkjim commented Oct 28, 2020

structure of first display

funderburkjim commented Oct 28, 2020

gasyoun commented Oct 30, 2020

funderburkjim commented Oct 30, 2020

Andhrabharati commented Mar 15, 2021 • edited Loading

Andhrabharati commented Mar 15, 2021 • edited Loading

Andhrabharati commented Mar 15, 2021

funderburkjim commented Mar 15, 2021

gasyoun commented Mar 15, 2021

funderburkjim commented Mar 16, 2021 • edited Loading

gasyoun commented Mar 16, 2021

funderburkjim commented Mar 16, 2021

gasyoun commented Mar 16, 2021

Andhrabharati commented Apr 21, 2021

gasyoun commented Apr 21, 2021

funderburkjim commented Apr 22, 2021

Andhrabharati commented Apr 22, 2021

funderburkjim commented Oct 28, 2020 •

edited

Loading

Andhrabharati commented Mar 15, 2021 •

edited

Loading

Andhrabharati commented Mar 15, 2021 •

edited

Loading

funderburkjim commented Mar 16, 2021 •

edited

Loading