Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate changes from BOP-Main-L2 file #6

Closed
drdhaval2785 opened this issue Jan 16, 2024 · 8 comments
Closed

Incorporate changes from BOP-Main-L2 file #6

drdhaval2785 opened this issue Jan 16, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@drdhaval2785
Copy link
Contributor

Andhrabharati did some changes in BOP file taken from csl-devanagari repository and posted the file at sanskrit-lexicon/csl-devanagari#40 (comment)

This issue is devoted to study if and how the changes can be carried to csl-orig repository.

@drdhaval2785 drdhaval2785 added the enhancement New feature or request label Jan 16, 2024
@drdhaval2785 drdhaval2785 self-assigned this Jan 16, 2024
@Andhrabharati
Copy link

Good initiative, @drdhaval2785 !!

Pl. keep this also in mind, while on the exercise you have mentioned.

@Andhrabharati
Copy link

Andhrabharati commented Jan 16, 2024

@drdhaval2785

You should also make a note of this post, and this post.

And, you may even think of splitting various CDSL texts into parts, to populate the "csl-doc" as you mentioned in the 2nd link issue.

@funderburkjim
Copy link
Contributor

Regarding the line-break situation in bop.txt and BOP_main.txt

The csl-orig/v02/bop/bop.txt has line breaks as in printed text.

BOP_main.txt seems to have 'removed' all line-break info. In particular, it does not have the 🞄 as discussed here.

It might have been preferable to use 🞄, but I don't think this omission is material (line-break preservation not as important now).

Incidentally, AB appears to have 'resolved' the end of line '-' cases (hyphenated word cases). This is a good improvement. @Andhrabharati what is your procedure for resolving these?

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue May 7, 2024
funderburkjim added a commit to sanskrit-lexicon/csl-pywork that referenced this issue May 7, 2024
@funderburkjim
Copy link
Contributor

funderburkjim commented May 7, 2024

incorporation of changes in cdsl version

The aim was twofold:

  • incorporate into cdsl version of bop.txt the corrections and improvements of BOP-Main-L2 file
  • maintain the lines of cdsl version as far as possible.

The work is done in issue6 directory of this repository.

The final cdsl version in this process is temp_bop_4b.zip.


Several changes to the BOP-Main-L2 file were deemed necessary.
The final version is temp_bop_1a_ab.zip re-uploaded

NOT FOUND ? temp_bop_1a_ab.zip. how did this get lost?

Note this version is slp1 based and removes the tab markup of the BOP-main-L2 form.


The changes of intermediate cdsl versions are in various
'change' files in the cdsl subdirectory.
cdsl/readme.txt documents the process. At NOTE: changes to temp_bop_1a_ab.txt are details of changes to Andhrabharati version.


It was interesting to find a way to compare the greatly different AB version to the CDSL version.
The key was that the two versions agree in the 'metaline' sequence.
We can concatenate (e.g. ''.join(lines) in python) each file and then
compare in various ways. The final way was to split the data (at spaces. e.g. x.split() in python) thereby getting sequences of words. Then look for differences in the word-sequences and resolve.


Summary of improvements:

  • corrections of typos (not very many)
  • resolving hyphenation at line-end (e.g. com-\npare -> compare\n
  • rationalizing the literary source references (e.g. N. 1. 2. 3. 4. -> N. 1.2 3.4)
  • numerous other little differences.

When @Andhrabharati reviews this work, I think he will agree with most of the changes to his version, and we can close this particular issue.

A reasonable next step for enhancing BOP would be addition of various 'abbreviation' markup.

@Andhrabharati
Copy link

Andhrabharati commented May 8, 2024

Several changes to the BOP-Main-L2 file were deemed necessary. The final version is temp_bop_1a_ab.zip.

This revised file is "Not Found", @funderburkjim !

It appears that you had put good efforts to understand my earlier file; but as I had already mentioned to Dhaval in another issue, I did not do much (needed) work in BOP those days other than basic "filling work", and some of those filled greek and slavonic strings got corrected after Anna's proofing.

So, we can consider your revised cdsl file as the final work for now, as I see no point in spending time comparing the revisions to my file (as mentioned by you). [But still you might push the file to Github, so that I can pull/download the same.]

BTW, I see couple of questions raised by you in the readme file; probably you would like to see my responses to those before closing this issue.

Let me answer those later in the day.

@funderburkjim
Copy link
Contributor

I re-uploaded temp_bop_1a_ab.zip See the comment above for link. No idea how the original upload got lost.

@Andhrabharati
Copy link

I had a very unexpected issue yesterday morning: my computer keyboard got "corrupted" for unknown reasons and was sending all kinds of erratic characters or no characters at all. I had tried to make use of the on-screen keyboard offered by Windows; but that was too cumbersome, so did not proceed further and I had ordered for a new keyboard (and took a good rest for the whole day). I just got delivered the new keyboard now, and immediately started the "pending work".

I tried to generate a Devanagari file from the posted slp1 file (using the bop_transcode script in the sanskrit-lexicon\BOP\issues\issue6\cdsl folder), to compare with my original file; but surprisingly, some 2000+ slp1 strings remained unconverted to DNG, so stopped the exercise. So limited my response to the readme file contents.

Here are the responses from my side--

;; readme line 350
Jim: '. infr.' -> '.infr.' What is the abbreviation here?

AB: 'infr.' stands for infra
-------------------------------------
;; readme line 498
Jim: Question: re Slavonic:
1. What is the modern equivalent? There is no 'slav...' in Google Translate
2. Why did AB seem to 'lower-case' the entries?
Because of limited font support for some upper-case?

AB: 1. I presume, it could mostly be Bulgarian (which is from Old Church Slavonic). [But, I could be wrong!]
2. Pl. see my earlier response at another post (#5 (comment))
[Of course, I had seen that some of those characters do not have upper-case defined in unicode; but I had tried to completely 'imitate' what BOP print had.]
-------------------------------------
;; readme line 679
JiM: ć (preformed) vs. ć (combining) Both forms exist in BOP
SHOULD they all of 1 form (Jim prefers pre-formed)

AB: Yes, we can go with the pre-formed letter.
-------------------------------------
;; readme line 769
Jim: BEGIN changes to greek text in AB version.
35 entries with changes Why so many?

AB: I had expected corrections in my initial "raw" work, hence asked for a proofing;
and then, we got the filled up strings "finalised" after some iterations between Anna and myself.
-------------------------------------

==============================================
And there are some cases where I differ with what Jim mentioned--

;; readme line 695
Jim: <L>5805<pc>244-a<k1>Bavat
old: {%reverentiae causâ ponitur pro pronom.%}
new: {%reverentiae causâ ponitur pro pronom%}.

AB: 'pronom.' is an abbr. and as such this change is to be reverted.
AB: however, this change is not found in the final file!!
-------------------------------------
;; readme line 925
Jim: old: .» occurs 9 times (9 changes)
new: ». occurs 350 times

AB: <L>904 should have "AM.»" only, and not "AM»." as it stands for an abbr.
<L>1171 should have "lat.»" only, and not "lat»." as it stands for an abbr.
-------------------------------------
;; readme line 1045
Jim: 4369<L>1532<pc>053-a<k1>uSIra SAK. 43.8. print change from 'SAK. 43..'

AB: This is not a print change, but just a typo correction.
-------------------------------------

If Jim makes corrections as mentioned above, this issue can be closed.
[No need to generate a revised file for AB version.]

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue May 9, 2024
funderburkjim added a commit that referenced this issue May 9, 2024
@funderburkjim
Copy link
Contributor

temp_bop_4c.zip takes into account previous comment.
change_bop_4b_4c.txt contains the (few) changes.

I went ahead and reconstructed revised form of AB's file BOP_main_L2_rev.txt

A 'diff -w' with BOP_main_L2.txt shows approx. 119 changes.

Another view of the changes to BOP_main_L2 is cmp_bop_1_ab_1a_ab.txt

Time to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants