Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BOP study-2 #40

Closed
Andhrabharati opened this issue Apr 28, 2022 · 29 comments
Closed

BOP study-2 #40

Andhrabharati opened this issue Apr 28, 2022 · 29 comments

Comments

@Andhrabharati
Copy link

Andhrabharati commented Apr 28, 2022

As done in my other works so far, this CDSL BOP.txt file is also split into parts-
BOP_front.txt
BOP_abbr.txt
BOP_addenda.txt

The main text portion (BOP_Main.txt) is not done fully yet.

@Andhrabharati
Copy link
Author

BTW, when enquired with @gasyoun , he said he would help @funderburkjim to see that the Avestan strings are properly rendered using a suitable font.
So I am leaving that piece of work to him.

There is one Avestan string in the preface (front.txt), and 42 strings are in the main.txt. (just for info.)

@Andhrabharati Andhrabharati changed the title BOP study BOP study-2 Apr 28, 2022
@Andhrabharati
Copy link
Author

Andhrabharati commented Apr 30, 2022

Here is the first installment of BOP_main study, @funderburkjim --

BOP metaline pc corrections.pdf
BOP metaline pc corrections.zip

@Andhrabharati
Copy link
Author

Here is BOP_main text with the tagged greek portions handled--
BOP_main.txt

Next to look out for other places where it is present in the print.
Also some Lith. text has errors in diacritics that need corrections.

Will post the revised file once I finish these two tasks.

@funderburkjim
Copy link

bop_main.txt file looks useable for Greek text.

Will continue discussion in the issue above.

@Andhrabharati
Copy link
Author

I can only suggest that you use other language/script strings as well, from this!

@Andhrabharati
Copy link
Author

Here is the presently 'closed' version of BOP text from my side,--
BOP_main-L2.txt

As this has some good amount of corrections and updates, probably @funderburkjim might consider re-running his programs and update the files that were generated yesterday.

Even otherwise, he needs to consider taking the addenda file posted above, to have the complete data.

[I would have listed the salient points/corrections in the whole file (in my both versions) wrt the cologne text, but don't think it is of any worth doing.]

@funderburkjim
Copy link

I've sent the work to Jahr for proof-reading based on the first Bop_main.txt file. (ref: sanskrit-lexicon/BOP#1).
When he returns this, I plan to take into account BOP_main-L2.txt.

@Andhrabharati
Copy link
Author

Andhrabharati commented Jan 16, 2024

@funderburkjim

As there seemed to be no response from this Jahr (for a long enough period), we got this BOP greek words proofing done by Anna in April 2023, and you had integrated the same into the CDSL text.

Now, you may help @drdhaval2785 , by informing him how you had 'worked' with my earlier BOP file.
He is planning to try out processing my later BOP file to adapt into the "CDSL system".

@drdhaval2785
Copy link
Contributor

I think I have figured out (mostly). In case I need any assistance, will let you know.

@Andhrabharati
Copy link
Author

Good to hear this, Dhaval!

[I just thought if Jim could be involved (JIC something could be grasped from him), it might make your effort fruitful faster; hence, messaged him..]

@Andhrabharati
Copy link
Author

BTW, you might've noticed that this issue has the other 'parts' of BOP text at the top.

@Andhrabharati
Copy link
Author

And once you could finish adapting my BOP_main file, 'integrating' the BOP_addenda data into it could be thought of, in the same lines as done in GRA recently (by Jim); thus establishing the process (to be implemented in other CDSL works as well).

@funderburkjim
Copy link

From the above comments, @drdhaval2785 is taking up AB's revision of BOP.
Thus, no action is needed from me at this time.

@Andhrabharati
Copy link
Author

Andhrabharati commented Feb 3, 2024

Thus, no action is needed from me at this time.

@funderburkjim
I guess, Dhaval is still waiting for the responses from you and Marcis (the 'active' team members) to proceed further; he seems to have stopped at another issue for over two weeks now (it does not need so much time to get through my files).

@drdhaval2785
Copy link
Contributor

It is not about going through the file, but checking for invertibility computationally which takes more time.

@gasyoun
Copy link
Member

gasyoun commented Feb 3, 2024

@Andhrabharati There is one Avestan string in the preface (front.txt), and 42 strings are in the main.txt. (just for info.) -- can you copypaste them here, please?

@Andhrabharati
Copy link
Author

Look for them yourself, they are already there in the files posted.

@funderburkjim
Copy link

it is not about going through the file, but checking for invertibility computationally which takes more time.

@Andhrabharati When we (dhaval or I) integrate one of your dictionary revisions (such as BOP in this case), we need to understand what you did. Normally, we have to discover
how your version differs from the cdsl version that you started with. We also need to identify areas where the construction of cdsl xml (make_xml.py) and html display forms (basicadjust.php) need to be revised for consistency with your version (e.g. when you add new markup tags, like 'per').

I fondly recall one case (with your revisions to PW), where you provided a summary of your changes which was very helpful to me at the time see this comment from AB.

If you provide a similar guide for your revision of BOP, this might be helpful for Dhaval's work.

Hope you and Dhaval find this comment constructive.

@funderburkjim
Copy link

@gasyoun Here are the Avestan strings marked in main.txt

<lang n="Avestan"> 𐬀𐬭𐬆𐬥𐬌</lang>
<lang n="Avestan"> 𐬥𐬌𐬱</lang>
<lang n="Avestan"> 𐬀𐬎𐬎𐬀</lang>
<lang n="Avestan"> 𐬀𐬯𐬞𐬀</lang>
<lang n="Avestan"> 𐬀𐬈𐬯𐬨𐬀</lang>
<lang n="Avestan"> 𐬌𐬜𐬀</lang>
<lang n="Avestan"> 𐬀𐬈𐬎𐬎𐬀</lang>
<lang n="Avestan"> 𐬐𐬀𐬌𐬥𐬉</lang>
<lang n="Avestan"> 𐬨𐬁𐬗𐬌𐬱</lang>
<lang n="Avestan"> 𐬥𐬀𐬈𐬗𐬌𐬱</lang>
<lang n="Avestan"> 𐬐𐬀𐬝</lang>
<lang n="Avestan"> 𐬒𐬱𐬀𐬵𐬌𐬌𐬀</lang>
<lang n="Avestan"> 𐬰𐬆𐬨</lang>
<lang n="Avestan"> 𐬔𐬀𐬌𐬭𐬌</lang>
<lang n="Avestan"> 𐬔𐬀𐬭𐬌</lang>
<lang n="Avestan"> 𐬔𐬀𐬭𐬋𐬌𐬱</lang>
<lang n="Avestan"> 𐬔𐬀𐬭𐬋𐬌𐬝</lang>
<lang n="Avestan"> 𐬔𐬆𐬭𐬆𐬞</lang>
<lang n="Avestan"> 𐬔𐬇𐬎𐬭𐬎𐬎</lang>
<lang n="Avestan"> 𐬗𐬀𐬚𐬭𐬎𐬱</lang>
<lang n="Avestan"> 𐬵𐬌𐬰𐬎𐬎𐬀</lang>
<lang n="Avestan"> 𐬰𐬀𐬊𐬴𐬀</lang>
<lang n="Avestan"> 𐬯𐬙𐬁𐬭𐬆</lang>
<lang n="Avestan"> 𐬯𐬙𐬁𐬭</lang>
<lang n="Avestan"> 𐬀𐬯𐬞𐬀</lang>
<lang n="Avestan"> 𐬞𐬀𐬋𐬌𐬭𐬌𐬌𐬀</lang>
<lang n="Avestan"> 𐬠𐬁𐬰𐬎</lang>
<lang n="Avestan"> 𐬠𐬎𐬜</lang>
<lang n="Avestan"> 𐬠𐬏𐬌𐬜𐬌𐬌𐬉</lang>
<lang n="Avestan"> 𐬠𐬏𐬌𐬛𐬌𐬌𐬋𐬌𐬨𐬀𐬌𐬜𐬉</lang>
<lang n="Avestan"> 𐬨𐬄𐬚𐬭𐬀</lang>
<lang n="Avestan"> 𐬁𐬌𐬌𐬱𐬯𐬉</lang>
<lang n="Avestan"> 𐬎𐬎𐬒𐬱</lang>
<lang n="Avestan"> 𐬠𐬀𐬯𐬙𐬀</lang>
<lang n="Avestan"> 𐬵𐬎𐬴𐬐𐬀</lang>
<lang n="Avestan"> 𐬯𐬞𐬀</lang>
<lang n="Avestan"> 𐬯𐬞𐬁𐬥𐬆𐬨</lang>
<lang n="Avestan"> 𐬒𐬱𐬎𐬎𐬀𐬱</lang>
<lang n="Avestan"> 𐬵𐬀𐬞𐬙𐬀𐬚𐬀</lang>
<lang n="Avestan"> 𐬵𐬉</lang>
<lang n="Avestan"> 𐬵𐬋𐬌</lang>
<lang n="Avestan"> 𐬵𐬎𐬎𐬀𐬭𐬆</lang>

and here is the one instance in front.txt

<lang n="Avestan">𐬯𐬙𐬏𐬌𐬜𐬌</lang>

@funderburkjim
Copy link

funderburkjim commented Feb 6, 2024

Regarding the line-break situation in bop.txt and BOP_main.txt

The csl-orig/v02/bop/bop.txt has line breaks as in printed text.

BOP_main.txt seems to have 'removed' all line-break info. In particular, it does not have the 🞄 as discussed here.

It might have been preferable to use 🞄, but I don't think this omission is material (line-break preservation not viewed as important now).

Incidentally, AB appears to have 'resolved' the end of line '-' cases (hyphenated word cases). This is a good improvement. @Andhrabharati what is your procedure for resolving these?

This comment duplicated at sanskrit-lexicon/BOP#6

@Andhrabharati
Copy link
Author

Andhrabharati commented Feb 7, 2024

@funderburkjim

I presume that @gasyoun had asked for the Avestan strings, for he has committed with me that he would 'help' you render them properly in CDSL displays [when I had mentioned to him that these Avestan characters wont be 'normally' present in any general/common font], and not for any other reason.

image

---------------------------------------
Here are the Avestan words (from BOP main portion) with the font I had made myself, for 'seeing' them while I was at BOP.

image
image

You may see the very first two strings at the L-334 entry, being displayed as BOX characters
image

@Andhrabharati
Copy link
Author

@Andhrabharati what is your procedure for resolving these?

@funderburkjim

I had already mentioned about my resolving of hyphenation(s) at the line-ending(s) sometime before (somewhere!).

@funderburkjim
Copy link

https://fonts.google.com/noto/specimen/Noto+Sans+Avestan

@Andhrabharati and @gasyoun

Should this font be acceptable?

@Andhrabharati
Copy link
Author

Mostly, but not fully!!

Just look at the first word in the list, for example.

@funderburkjim
Copy link

I think the problem with the first word is due to an error in input -- that f88a is not part of the Avestan unicode

image

So that first word does not disqualify the google font.

@Andhrabharati
Copy link
Author

Andhrabharati commented Feb 8, 2024

My error, in typing (rather, filling the Avestan strings)!

Pl. replace the string as <lang n="Avestan">𐬥𐬌𐬱𐬙𐬀𐬭𐬆</lang>

@Andhrabharati
Copy link
Author

Though the look of Noto Sans Avestan is not so pleasing (to the eyes), it sure can be an option to render these strings at CDSL.

@funderburkjim
Copy link

Here is comparison for that first word, of

  • BOP Scan
  • replacement rendered with Noto Sans Avestan (using the 'type-tester' provided by Google at the link above)
  • Unicode code points (as displayed in Emacs)
image

Incidental question: how does one enter into a text editor a unicode code point
such as u+010B06 ?

@Andhrabharati
Copy link
Author

There are various 'keyboard' utilities that make this possible (for bulk text).

But, for a minimal text insertion such as this at BOP, the easiest way out is to use the copy/paste from the (font) 'charmap' utility in Windows OS.

Speaking of fonts, I just wonder if we can update the indologic font being used at CDSL, to include these Avestan glyphs and also the 'hom' numbers (with a dot) and Roman numerals? What say you, @gasyoun ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants