vac2a - picture data in TE #19

funderburkjim · 2021-03-18T02:14:30Z

In the process of preparing hiatus-corrections (#18),
I discovered that there are some (about 200) awkward lines in the vac2.txt (Tirupati data).

These lines were selected from vac2 based on one of two criteria:

the corresponding line of vcp.txt (Cologne data) is <Picture> or
the vac2 line is 300 or more characters long.
- this is twice as long as any vcp.txt line

About half the 200 lines thus selected actually satisfy both criteria.

The text was updated successfully, but these errors were encountered:

funderburkjim · 2021-03-18T02:20:03Z

For the lines so identified, two actions were taken:

The lines of both vac2 and vcp2 were put into the file vac2a_picturedata.txt. We may later want to try to understand why these lines are present in the vac2 version of Tirupati data; but at the current level of study, these lines are just in the way.
A new version of vac2.txt was made called vac2a.txt. In this version, those 200 lines are just represented by a '?' character.

gasyoun · 2021-03-18T07:30:52Z

@Andhrabharati do I understand right that those pages with images should be rescanned or just cutting out as they are would be enough?

Andhrabharati · 2021-03-18T07:35:15Z

just cutting is enough.

Andhrabharati · 2021-03-18T08:02:15Z

probably the issues I talked about in meta2 file could be done first as they are kind of "identified"-

ai & au cases
Picture cases
Column/Tabular data places

And now the abbr. place corrections.

Andhrabharati · 2021-03-18T08:05:23Z

I guess the tabular data could simply be rendered as / marked, instead of just with a space (or with varying Cn tagging).

And I was thinking of taking it up now.

Andhrabharati · 2021-03-18T14:39:32Z

Here is the list of tables and pictures in the Vacaspatya, made wrt the print pages (whatever worth it has).

List of tables (tabular data) and Pictures.txt

gasyoun · 2021-03-18T15:18:00Z

I guess the tabular data could simply be rendered as / marked, instead of just with a space, or with varying Cn tagging.

We can use the github markup language there, right.

Ref: #19

funderburkjim · 2021-03-18T21:46:26Z

I've reorganized the file names a bit in the visible part of the repository.

vcp.txt contains the latest version of the Cologne digitization
- this is in metaline format
- at some point, the version of vcp.txt in this repository will be moved to the
  production version at csl-orig/v02/vcp/vcp.txt. But meanwhile the version in
  this 'vcp' repository will be ahead of the production version.
vac2.txt contains the latest version of the Tirupati digitization
vcp2.txt is based on vcp.txt, but in the format comparable to that of vac2.
Also, its 'difference number' field is based on both vac2.txt and vcp.txt.
readme_changes.txt will hold a log so we can keep track of the changes.

Thus far, I've implemented changes pertaining to:

the 'ai/au' issue (meld-prep: 'ai'/'au' #18)
the picture data (and long lines) (this issue)
lines beginning with a space, and spacing related to '+' symbol
{??} missing data (Correction of {??} missing text in vcp.txt #16)

And am looking for other low-hanging fruit

gasyoun · 2021-03-18T23:01:20Z

And am looking for other low-hanging fruit

Are you sure we are ready to go deeper as headword issues for now?
@drdhaval2785 is the API issues closed? It seems to be endless, the
journey inside the Sanskrit-Sanskrit dictionaries nobody even uses.

drdhaval2785 · 2021-03-18T23:38:18Z

I think the API issue is closed.
Now any further API development would be needed aa and when during frontend development, we need some information in a specific format. We need frontend developer for the same. API is closable from my side.

Regarding Sanskrit-Sanskrit dictionaries, actually they are the ones many people like me use exclusively. If I get no hits in SKD and VCP, then only I turn to MW.
So, if dictionaries were to be corrected for texts, I would put these two on much higher priority. Priority may change in European or American continent, but in Indian subcontinent, Sanskrit-Sanskrit dictionaries are widely used.

Andhrabharati · 2021-03-19T02:52:57Z

one might compare how MW and VCP grew up, based on the same roots- WIL, SKD and the German one (PWG).I guess VCP would be with many corrections to PWG notified in PWK, as Taranatha had the full set of manuscripts belonging to Vedic branch in his possession (or accessible to him).

Andhrabharati · 2021-03-19T02:56:49Z

MonierWilliams had to "wait" for PWK to come and other works been published. MW99 also has couple of entries taken from Apte90 (by Cappeller, as mentioned in the front pages of it).

Andhrabharati · 2021-03-19T03:04:11Z

Are you sure we are ready to go deeper as headword issues for now?

This reminds me of the work I started those days in 2016; I had finished the HW correction for the vowels part. It had treated double (multiple) HWs much better than the exercise at cologne (Usha and Jim).

Andhrabharati · 2021-03-19T03:24:33Z

The Meld exercise is to look (mainly) at the differing lines in vcp2 and vac2 (and refer to scans to decide the corrections).

but I strongly feel the other lines also need to be read once with the scans, as both the digitisations (TPT and Koeln) have erred at many places.

gasyoun · 2021-03-19T06:22:03Z

We need frontend developer for the same.

I might have found one. I need your understanding of the tasks in a more detailed way.

API is closable from my side.

Got it. What was next priority in your Dec 2020 list?

Sanskrit-Sanskrit dictionaries are widely used

Widely used and yet only 2 people from India interested in cleaning this wast ocean. Because all of time and energy can go here and all the other tasks will stop, because there is no end if we go that deep inside these two oceans.

VCP would be with many corrections to PWG notified in PWK

Guess not, sounds just like some fantasy.

Taranatha had the full set of manuscripts belonging to Vedic branch in his possession (or accessible to him)

Than indeed differs him, but has he used his advantage in full?

MonierWilliams had to "wait" for PWK to come and other works been published

Exactly, around 10 years.

It had treated double (multiple) HWs much better than the exercise at cologne

Not sure I understand what you mean. Can you give a sample, please?

The Meld exercise is to look (mainly) at the differing lines in vcp2 and vac2 (and refer to scans to decide the corrections).

I mean it was not in the priority list of 2021 and it can stop all tasks, every other task in the list for just this one. There are some minor tasks, that only Jim can give an answer to, but solving and integrating VCP corrections would swallow everything. Even MW is huge, but there is no way back. @drdhaval2785 are you personally eager to put your koshas aside and work with the VCP as intensive as it is proposed above? @Andhrabharati is yet in MW pond, VCP ocean might not what he is interested in, do not know. He works like a bull, but is that something you both can concentrate without disabling Jim for the priorities set 3 months ago?

Andhrabharati · 2021-03-19T08:14:25Z

I am not at MW99; no fulltime works for past some weeks my side.

Andhrabharati · 2021-03-19T09:06:51Z

Widely used and yet only 2 people from India interested in cleaning

how many people did you get for MW and PWG, to clean/correct worldwide? (forget about occassional feedbacks)

funderburkjim · 2021-03-21T18:58:43Z

as both the digitizations (TPT and Koeln) have erred at many places

Agree. In recent comparison, I have also found differences between TE and the
scan; sometimes Cologne right (agrees with scan) and TE is wrong, sometimes TE is right and Cologne is wrong; and (probably) sometimes both Cologne and TE disagree with scan.

Andhrabharati · 2021-03-21T19:43:50Z

good to see someone getting my intent correctly.

I was just thinking to start full proofing of vacaspatyam on my own.

for me neither of the digitisations are satisfactory enough, and I see no point spending time in just keep on comparing them and correcting both.

and the way things are moving here to remove the Bengal flavor (dialect), is quite against our (AB) principles of handling the texts.

one might recall how the great Panini never took to normalising the words by taking any one school as a standard, but just had them stay side by side.

one can just compare different schools, but never let one school override another.

so I would better be off from this exercise.

gasyoun · 2021-03-21T19:53:17Z

I was just thinking to start full proofing of vacaspatyam on my own.

Are you still?

how many people did you get for MW and PWG, to clean/correct worldwide? (forget about occassional feedbacks)

Andhrabharati · 2021-03-21T20:01:56Z

So isn't 2 far better than 0?

yes, I might start the work sometime soon; probably after finishing the mw99 annexure portion.

drdhaval2785 · 2021-03-21T23:37:25Z

@Andhrabharati and @funderburkjim ,

I know that both the digitizations are bad. But strictly speaking in mathematical terms and assuming both digitizations to be completely independent.

Let us assume 1/100 letters are wrong in both digitizations. So the probability of both digitizations being wrong at the same letter is 1/10000. Do we want to spend precious time on this miniscule? Maybe once we have corrected differring errors, it may be taken up. Not before.

gasyoun · 2021-03-22T09:43:41Z

Do we want to spend precious time on this miniscule? Maybe once we have corrected differring errors, it may be taken up. Not before.

Agree.

funderburkjim added a commit that referenced this issue Mar 18, 2021

vac2a init. #19

22d9a4a

gasyoun added the bug label Mar 18, 2021

funderburkjim added a commit that referenced this issue Mar 18, 2021

vac2: replace long lines and picturedata.

1bc92c7

Ref: #19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vac2a - picture data in TE #19

vac2a - picture data in TE #19

funderburkjim commented Mar 18, 2021

funderburkjim commented Mar 18, 2021

gasyoun commented Mar 18, 2021

Andhrabharati commented Mar 18, 2021

Andhrabharati commented Mar 18, 2021

Andhrabharati commented Mar 18, 2021 •

edited

Loading

Andhrabharati commented Mar 18, 2021

gasyoun commented Mar 18, 2021

funderburkjim commented Mar 18, 2021

gasyoun commented Mar 18, 2021

drdhaval2785 commented Mar 18, 2021

Andhrabharati commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021 •

edited

Loading

gasyoun commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021 •

edited

Loading

funderburkjim commented Mar 21, 2021

Andhrabharati commented Mar 21, 2021

gasyoun commented Mar 21, 2021

Andhrabharati commented Mar 21, 2021

drdhaval2785 commented Mar 21, 2021

gasyoun commented Mar 22, 2021

vac2a - picture data in TE #19

vac2a - picture data in TE #19

Comments

funderburkjim commented Mar 18, 2021

funderburkjim commented Mar 18, 2021

gasyoun commented Mar 18, 2021

Andhrabharati commented Mar 18, 2021

Andhrabharati commented Mar 18, 2021

Andhrabharati commented Mar 18, 2021 • edited Loading

Andhrabharati commented Mar 18, 2021

gasyoun commented Mar 18, 2021

funderburkjim commented Mar 18, 2021

gasyoun commented Mar 18, 2021

drdhaval2785 commented Mar 18, 2021

Andhrabharati commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021 • edited Loading

gasyoun commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021

Andhrabharati commented Mar 19, 2021 • edited Loading

funderburkjim commented Mar 21, 2021

Andhrabharati commented Mar 21, 2021

gasyoun commented Mar 21, 2021

Andhrabharati commented Mar 21, 2021

drdhaval2785 commented Mar 21, 2021

gasyoun commented Mar 22, 2021

Andhrabharati commented Mar 18, 2021 •

edited

Loading

Andhrabharati commented Mar 19, 2021 •

edited

Loading

Andhrabharati commented Mar 19, 2021 •

edited

Loading