-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
abbreviation preparation #11
Comments
Identification of abbreviations in VCPAbbreviations in VCP may be identified as words ending in the digit '0'. There are about 4000 distinct such abbreviations in our digitization vcp.txt. Grammar and Literary SourceExamination of a few examples leads me to think that there are basically 2 types of abbreviations:
The 'first-two-lines' rule is just a rough preliminary indication of whether a given abbreviation should |
a first displayPlease see abbrev0_roman_100.md. The file name indicates certain details of the display:
154 instances is a workable number. If we make progress on these most common abbreviations, then |
structure of first displayThe display is a table whose columns are:
The links show one of the Cologne displays for the word in VCP dictionary. |
I've communicated back by email to James. We'll have to see what else might be needed to |
There are some hints. Remember the Tirupati edition files, it had some literary sources? abbrev0_roman_100.md so very well done.
Agree. |
As a reminder, vac.txt is our copy of the Tirupati edition. First glance at vac.txt shows markup with tags such as
It would be useful to
I don't see definite markup of the abbreviations in vac.txt. |
|
Any result from this crowd-sourcing? More often than not, our observation is that nothing get started, once the work is "allotted".
May I ask @funderburkjim to post the complete list here (preferably in Devanagari)? I tried making one such list myself. 1. This shows that many entries are with spelling and spacing errors (I did remove some of them, but then stopped). |
@gasyoun |
A complete devanagari list (as github markdown table) is in two parts: The one-part form is also prepared, but is too big for github to display properly. There is also a simpler list, with each abbreviation and its frequency, This should be comparable to VCP.abbreviations.extracted.txt from @Andhrabharati (see a previous comment for link). |
Tirupati people are famous for bad documenting, so it's the same with the Tirupati edition of digital Ramayana.
The file we have at Cologne is based on the CD Usha sent me. That is, nothing else about it. Analysis done in 2014. f_WX.txt |
The Tirupati vacaspatyam I started with in this repository is vac_input.txt. According to my notes in the readme.org of vcpte-vac,
|
Oh, so you believe the two versions have the same source initially. |
The Tirupati version I got from Scharf had already been put into SLP1. I don't know what source Peter started with; but since you mentioned the existence of a CD made by Tirupati, it may be that Peter started from that cd. |
Oh, ok, because when I saw the CD it was in that funny WX encoding. And contained not only the dictionary file, but several additional, including the Preface. |
Finally done with the first phase of Vacaspatyam corrections, starting mainly with the abbr. markers, in a focused effort for two weeks; and the summary is in the file below- Phase-1 of work on Vacaspatyam.txt Almost all the abbr.s are resolved now!! It appears that the present Cologne data has missed the dual/variant forms of the HWs (marked in parenthesis in the print), and also many errors are noticed. Hence it is desirable to correct the HWs portion (before touching the body portion), which I would like to take up in next few days. |
Yeah, headwords is what comes first. Thanks for the hard work @Andhrabharati
But where to look for them? |
@Andhrabharati There are a lot of 'extra' headwords at These probably include many of the 'dual/variant forms' . |
Yes @funderburkjim, I've seen this file as well as the Vachaspatyam-Doubles-16.3.15.xlsx file from @gasyoun. As I saw, there are some errors in both the files. So decided to do it again myself, while looking for HW errors throughout. |
Via Email, a user, James, expressed an interest in identifying the abbreviations in Vacaspatyam dictionary.
In part, he said:
This issue devoted to getting started with this.
The basic problem is that there is no known list of abbreviation expansions for VCP.
The text was updated successfully, but these errors were encountered: