Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKD digitization (Devanagari version) #11

Open
funderburkjim opened this issue Jan 9, 2021 · 18 comments
Open

SKD digitization (Devanagari version) #11

funderburkjim opened this issue Jan 9, 2021 · 18 comments

Comments

@funderburkjim
Copy link
Contributor

@Shalu411 Hi!

I've made a version of the digitization of skd that you requested.
[Did you think I had forgotten?]

Currently, the version is a sample of the first 10,000 lines, and it is
skd_deva_sample.txt

Take a look, and see if this sample is what you were requesting.
Or, tell me of any problems.

When you give the go-ahead, I'll generate the whole dictionary in a similar way.

@Shalu411
Copy link
Collaborator

Shalu411 commented Jan 10, 2021

Namaste Jim

[Did you think I had forgotten?]

You and forget!!? Not even in dream! I am glad we have it at the right time. Thanks you so much.
I have seen the sample. It should do!

Now how to note down the error?
For Eg- I see one right here (starred)
21-001अअ
अ¦, व्य, अभाबः । अल्पः ।
There should be अभावः
What is the format to make it? Please tell with one example.
Thanks
--Shalu

@drdhaval2785
Copy link

One friendly advice @Shalu411 .
Don't try to change b / v errors. Otherwise you will end up writing SKD and VCP afresh.

@gasyoun
Copy link
Member

gasyoun commented Jan 10, 2021

Don't try to change b / v errors.

Ignoring them is not a good idea as well. But there must be thousands of them.

end up writing SKD and VCP afresh.

For our digital purpose it might be not that bad idea at all - at least at an alternate headword level. Maybe generate just all words with v as b and vice versa, @drdhaval2785 ?

@Shalu411
Copy link
Collaborator

Shalu411 commented Jan 10, 2021

Hariom.
Ok. .Assuming it is a mistake I have to note down-
Can I note the errors this way?
Method 1) Give the whole technical detail of the word-
<L>2<pc>1-001<k1>अ<k2>
अभाबः >> अभावः

OR this - Method 2) just with the LCode?
L=2 अभाबः >> अभावः
Please guide me..

How is Sampada doing it?

@drdhaval2785 Can you please provide me the list of suspicious head-words / words in SKD?
Thanks

@funderburkjim
Copy link
Contributor Author

Please tell with one example.

An error file: skd_error.txt

In preparation, make an 'skd_error.txt' file where the changes are detailed.
Within skd_error.txt, make a line for each change. The format of such a line woud be, by example,
2:अ:अभाबः अभावः

There are 4 fields separated by colons, almost like your example. The 4 fields are:

  1. L-code (the cologne record number)
  2. k1 value the headword
  3. old : The word that needs to be corrected
  4. new : The correction

If you want to make a comment in skd_error.txt file, insert one or more lines after the above 4-field
correction line, and start each of the comment lines with a semicolon.
You can add extra blank lines if you want.

These formatting details are consistent with the xxx_error1.txt files that Sampada and Anna have
been using.

change the digitization

You should also change the digitization directly (currently, for this preliminary trial, this digitization file
is named skd_deva_sample.txt).

So incorporate the changes directly.

@funderburkjim
Copy link
Contributor Author

अभाबः -> अभावः

I am very much in favor of this change. I think @Shalu411 is experienced enough in Sanskrit to make a
reliable judgment in such cases.

There are 3 sitations that might have led to 'aBAbaH' in the skd digitization:

  1. The scanned image clearly shows 'b' and the typist who did the digitization accurately entered 'b'
  2. The scanned image clearly shows 'v' and the typist erroneously entered 'b'
  3. The scanned image is unclear, and the typist entered 'b'.

In the present case, I would say case 2 applies:

image

@Shalu411 If you go to the trouble of examining the scanned image, and happen to notice a
case of type '1' (i.e. a case where your change definitely disagrees with the scan), then you should
make a comment in skd_error.txt of the form '; scan error'.

However, I am not saying that you should examine the scanned image in every change,
as this extra scanned image examination may be more time-consuming than it is worth.

@funderburkjim
Copy link
Contributor Author

@Shalu411

Did you clone the SKD repository? Are you using git or Github desktop?

@gasyoun
Copy link
Member

gasyoun commented Jan 10, 2021

2:अ:अभाबः अभावः

Oh, these visargas that look like :.

Did you clone the SKD repository?

Not yet, she will need my help.

Github desktop

She will. Let the whole converted file come?

In the present case, I would say case 2 applies

Agree.

@funderburkjim
Copy link
Contributor Author

Oh, these visargas that look like :

Good point. Maybe use '#' instead?

Let the whole converted file come?

Let's work a while with the sample file that is there. Once the procedural steps are ironed out,
we can go to a full skd_deva.txt.

she will need my help.

Thanks!

@gasyoun
Copy link
Member

gasyoun commented Jan 11, 2021

Good point. Maybe use '#' instead?

Let us give '#' a try?

Once the procedural steps are ironed out, we can go to a full skd_deva.txt.

Sure, so be it.

@gasyoun
Copy link
Member

gasyoun commented Jan 13, 2021

Usha pulled an update from Github Desktop. Is it as it should be @drdhaval2785 @funderburkjim ?

e8686f3

@Shalu411
Copy link
Collaborator

Hariom
Hearty Thanks Mark, for the support and guidance.
Jim, once you confirm, I am ready for carrying on with the corrections.

@gasyoun gasyoun changed the title Devanagari version skd digitization SKD digitization (Devanagari version) Jan 13, 2021
@funderburkjim
Copy link
Contributor Author

funderburkjim commented Jan 14, 2021

Usha: I confirm that you pushed properly; I can see the 1 change you made.

BUT please wait for making further changes.

I am having problems with inverting the Devanagari back to slp1, and need to get that problem
ironed out . Will aim for solving this problem tomorrow. The problem relates to the candrabindu
when it is after an 'o' but is not the Om character. ॐ In slp1, o~ is supposed to represent ॐ.

But there are several instances in the skd digitizations like under ठोँट under headword aDaraH that
also have 'o~' in slp1. These are what are causing the problems at the moment.

@Shalu411
Copy link
Collaborator

Namaste
BUT please wait for making further changes.
Sure Jim!
@drdhaval2785 Can you help with the o~ issue?

@gasyoun
Copy link
Member

gasyoun commented Jan 14, 2021

But there are several instances in the skd digitizations like under ठोँट under headword aDaraH that
also have 'o~' in slp1. These are what are causing the problems at the moment.

As rare as it can get. Jim, you are our fortress.

@funderburkjim
Copy link
Contributor Author

As rare as it can get

The was discovered by applying a principle of invertibility. Here,

  • our base digitization is in SLP1 spelling : skd.txt
  • A conversion was made to use Devanagari spelling: skd_deva.txt
  • To incorporate the changes Shalu makes to skd_deva.txt, we need to convert skd_deva.txt
    back to skd_slp1.txt.
  • And it should be that if NO changes were made to skd_deva.txt, then the round trip
    skd.txt -> skd_deva.txt -> skd_slp1.txt should result in skd.txt identical to skd_slp1.txt.
    The problem was noticed while investigating WHY, in earlier version of transcoding,
    skd.txt was NOT same as skd_slp1.txt.

This problem now has a satisfactory solution.

You can see that skd_deva_sample was changed in two lines from what Usha had,
by looking at this commit difference.

@funderburkjim
Copy link
Contributor Author

@Shalu411

Ready for you to pull this repository and continue with changes to skd_deva_sample.txt.

ALSO, I made a file 'skd_error.txt' where you should document simple changes, such as the
first 'aBAbaH' one you made.

By 'simple change', I mean spelling errors like 'aBAbaH'.

More complex errors (like missing a headword which you mentioned elsewhere) will need to have
special handling -- meaning that probably I need to do the actual change to skd_deva_sample.txt
rather than you for the complex errors.
You can describe such complex cases as comments in the skd_error.txt file.

@gasyoun
Copy link
Member

gasyoun commented Jan 14, 2021

Ready for you to pull this repository and continue with changes to skd_deva_sample.txt.

Good news for India.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants