-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fresh Look, starting with <is>
tag
#95
Comments
Work in this issue is done in the pwkissues/issue95 directory of this repository. start with latest pw.txt@Andhrabharati please start with latest csl-orig/v02/pw/pw.txt. A few (19) changes were made during development of transcode script. You could name this file 'temp_pw_0.txt'. transcode scriptThe pw_transcode.py script converts pw from one transcoding to another. First, make the 'pwtranscode' directory current terminal directory
Note 1: If you convert from slp1 to iast and then (without making changes to the iast version) Note 2: Conversion is applied to (a) both the k1 and k2 fields of metaline and (b) the {#X#} elements of the text. |
I had taken the recent pw.txt from csl-orig, for my present working. Will incorporate the 19 changes done by you now in my file. And, a big "Thank you" for the conversion scripts. |
Noted that 8 of 19 were already changed during my working. |
retain line-numberingAs with the work on Gra, request you maintain the line numbering in revisions of pw.txt. Then at the end of the pw revisions, we can remove unneeded blank lines. |
@Andhrabharati So I can follow your comments (such as at sanskrit-lexicon/CORRECTIONS#419 (comment)), |
Still quite a bit of work is remaining to cleanup the data to give out my prelim. file. I had only looked at the portions marked as italic; there are quite many places not marked so in the text (but are in italics, in print) that need to be identified. My present focus is on marking the abbr.s inside italics as well as outside. Pl. wait for few more days. Meanwhile, you may start looking/working on BHS, which I had made earlier & recently 'marked' citation numbers after GRA. |
Also quite many places are not marked with is-tag!! |
Just tried converting the slp1 file at my end, and got
instead of [from my earlier file pw_AB_08.txt]
The remark is about the Vedic svara conversion. Probably the underlying "rule" files (in the transcoder folder) are not the ones that we had 'finalised' earlier for the PW group. Would you pl. check this once? |
pw-style devanagari accents@Andhrabharati Yes, you are right regarding conversion.
I think this will solve that problem. Note: 1 typo noticed, under (slp1) |
Sure, go ahead. I'll take a look. Also please note that I need a posting of your current pw; so I can respond to the |
Posted my BHS file at the relevant repo. |
Got 9 more abbr. type is-entities, in the non-italic part (while checking the dot-ending words)--
And, noted that some entries listed in the pwis_mw.txt are in fact typos. |
Just showing an example word on this point,
MW entry for And, the MW entry for Finally, the pwk print has this as Thus, we can see that this entry has both a typo error (Kānda) as well as a print error (Kāṇda) in pwk, whereas it should’ve been Kāṇḍa. |
BTW, this above example reminds me of the very initial comments on the ls-entity display of PWG (and pwk) posted by me-- But it appears that either these posts have skipped Jim's attention, or he didn't see any value in this point. I feel REALLY bad whenever I see Rv, Av, etc. on CDSL PWG/pwk search results, while the MW display renders them 'appropriately' as RV, AV etc.. |
The fist entry that I had noted this discrepancy in is-words wrt the mw-words is dvipa that occurred 25 times, either by itself (Dvipa 6 times-- all in error) or as part of another word (dvipa 19 times-- all marked as notmw); whereas it should've been Dvīpa or dvīpa respectively. |
Sorry for having 'violated' this, @funderburkjim ! Rather, I haven't violated but just implemented the style I started in GRA, in this pw as well. I have started with minimal line-number changes (limited to 'embedding' [Pagexxxx] into other lines), for now; but I have more changes in mind, to prepare this pw in a "standard style" to be followed in the other CDSL works as well.
Hope you'd allow me a 'free-hand'(!!) here also, as done at GRA recently. |
These two were the binding-principles reg. the text-lines that I followed in the pw.txt file, and did the following replacements-- and this can be taken as my starting file, [the split lines are marked as |
If you have other thoughts, I shall post only the relevant If you happen to agree (I just hope you would!), then I shall start posting what all I have done so far [having finished the abbr. portion], and my prelim. file. |
pw_cdsl_0These observations based on work in pwkissues/issue95/compare0 directory. Generation of displays (locally) using pw_CDSL_0 encounters no problems. The generated pw.xml validates with pw.dtd. Great! A couple of minor observations:
Seems ok to proceed with further revisions. |
pw_CDSL_0.txt is not the version that I am working with; it is just regenerated from pw.txt to match the lines with my AB file. Here is the screenshot comparing the two files-- And you may see the split in my AB file at "Mit {#kar#}", breaking the prev. line into two lines.
There are no lines starting with My comment clearly shows the replacement of line starting with See the first such occurrence at lines 24204-6 in pw.txt
that get merged in pw_CDSL_0.txt (line 23955) as These are all (almost) the cases of what I mentioned above as "limited to 'embedding' [Pagexxxx] into other lines". |
This file has no "real" changes made, except the line mergers at [Pagexxxx]. |
Got it. Ready for 'real' changes. |
Here is the prelim. file to go through meanwhile (as you had done with my GRA file earlier, without any notes). This can be used to check and workout the abbr. expansions, if nothing else. I will start posting the notes from tomorrow morning, indicating various changes went into the file to get the prelim. file at my end (as of now), as I am too tired now. |
compare metalines ab_0 v. ab_1See results under 'compare1/readme.txt' at 'compare_hw step 2'. The other 3 (marked 'abi error?' ) should be corrected in temp_pw_ab_1.txt. |
text after
|
ab1 errors ? (based on differences between metalines in ab0, ab1 versions. only ab0: pwk (1158-1) has only ab0: Yes, this letter got here by error. only ab0: This is to be taken as a print error. only ab0: Does this pwk (7058-1) snippet answer the point? So in summary, I need to correct only 2 places out of these 4. |
This character got here by error.
Yes, and you had accepted the Now that the line-breaks around [Pagexxxx] lines are looked at, we can remove this • character thoughout. |
Next, I will start posting the changes made and then this (first-part of) Fresh-look issue can be closed, as it is growing longer. [I could not do this yesterday, having been engaged in some pressing chores.] |
The IAST corrections matter could be continued in the parent issue (PW IAST corrections #419), as this |
I have started with the simplest point, as mentioned at Space before punctuation marks (reg. PWG, pwk and pwkvn) #855 , and the counts now stand thus in my version of pwk-- Notes.
|
The |
One interesting point noticed is that at some places, the abbr.(s) in print pages are present in expanded form in the text file, most probably done by @maltenth (or who else could it be?) while applying his markups on the typed text [it is highly doubtful that the typists at India would have done this expansion]. Also seen that at many places the marked italic strings are not so in the print; and at far more places the italic strings of the print are present in normal face in the typed text. There is no way except a full reading wrt the print to "correct" these points completely, I suppose. |
I agree that the print has a comma.
Doesn't that comma need to be added to pw_1 ? Accept your point Barb. WIll start a print change file for this and perhaps other future print changes that arise. |
Sorry, I was looking at my current file that has undergone more changes; it has the comma here. |
No, this is a case where you did something in GRA that I was not aware of. If I had noticed it, I would have complained. The
(1) would NOT recognize Although I have thought of (1) as the default, I have (AFAIK) used (2) in all existing code, I still have a fondness for
Conclusion: I DO accept |
I think we can get rid of that And as you had mentioned elsewhere, this My thinking is that this [Pagexxxx] need/should not be on a separate line. |
Agree.
Prefer you to start a new issue here in PWK repository when you're ready. |
In such a case, the referred parent issue can be closed. No need to keep it open until a new issue is opened for the I see many issues still remain open in various repos, though their purpose is served. |
Please post the examples you have noticed. We can ask @maltenth if he recalls some reason.
Again, post some examples if they are at hand. I have wondered about the significance of italic/non-italic text in PW. Maybe if this distinction were conceptually clear, we could find some way to identify (and correct) many of these mistakes in pw.txt. |
The italics mostly denote the meaning/explanation portions in German language, as I could see. |
Wonder if the preface gives a clue, if reread. |
https://ru.wikipedia.org/wiki/%D0%91%D0%B0%D0%B3%D0%B0%D1%82%D1%83%D1%80 монг. baγatur (ᠪᠠᠭᠠᠲᠦᠷ ) ᠪᠠᠭᠠᠲᠦᠷ |
I could see the letter y in between and the letter t is not matching the character in the PWG print; so the word appears to be the (Mongolian) baga(?)yur. Can your (Mongolian) friend tell why the PWG has the (Mongolian) lettering upside-down and then left-to-right (or in other words, rotated by 180 degrees)? |
[Jim]
[AB]
@funderburkjim |
I suggest closing this issue, as the is-tags were more or less attended to. |
Work initially related to sanskrit-lexicon/CORRECTIONS#419.
The text was updated successfully, but these errors were encountered: