Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time for v03? #399

Open
funderburkjim opened this issue Dec 28, 2022 · 4 comments
Open

Time for v03? #399

funderburkjim opened this issue Dec 28, 2022 · 4 comments
Labels
enhancement New website features

Comments

@funderburkjim
Copy link
Contributor

funderburkjim commented Dec 28, 2022

Recent discussions (See) have stalled due to differences of opinion regarding markup of a dictionary (in this case MW).

There has also been difficulty over how to include new features (such as synonyms in ARMH dictionary, subheadwords in PWG, BUR, ).

In these cases, there are

  • reasons to maintain many of the conventions of current digitizations and displays
  • reasons to create new conventions that are incompatible with the current conventions.

Here is a suggestion that might allow work on new conventions to be done with minimal interference or interaction with current conventions and displays.

Add a 'v03' level to the repositories csl-orig, csl-pywork, and csl-websanlexicon

  • csl-orig/v03/mw would contain mw.txt and related files
    • v03 versions of other dictionaries could be done if desired, but not necessarily.
  • csl-pywork/v03 would contain the code to generate the xml version of mw, based on
    the conventions of the csl-orig/v03/mw/mw.txt
  • csl-websanlexicon/v03 would contain the (php) code that generates php web application to display the v03 version of mw.xml

The v03 code might make selective use of the v02 conventions, but it would not be bound by them.

When carefully done, this code would be, I believe, independent of the current v02 digitization and code.

Thus, work could be done in the v03 space independent of the v02 space.

@gasyoun
Copy link
Member

gasyoun commented Dec 28, 2022

reasons to maintain many of the conventions of current digitizations and displays

Would not it make too much issues to support all the displays endlessly? Maybe there are too many of them?

reasons to create new conventions that are incompatible with the current conventions.

Wonder what exactly, if we speak about MW.

@gasyoun gasyoun added the enhancement New website features label Dec 28, 2022
@funderburkjim
Copy link
Contributor Author

displays endlessly?

Ultimately, no. But the MW revision under consideration is not simply a matter of correcting spelling errors, punctuation omissions, etc. It provides the opportunity to challenge many of the representational choices made previously. Some choices that I (or Peter and Malcolm at earlier stages) have made that I now don't like very much are:

  • Introduction of separate entries for 'A', 'B', etc. (H1A, H2B, etc.) This creates
    problems of data duplication. (e.g. in k1 and k2) . The sections in such A, B, entries
    might be better combined into the parent entry, and distinguished by some kind of markup, such as some kind of div.
    • Also there is data duplication between 'k2' of metaline and the material before the broken bar.
  • handling of 'alternate headwords'. Again, there is data duplication in mw.txt by
    having two (or more) nearly identical entries just because there are two or more spellings provided for some root (for example). And the <info or= convention is awkward.
  • Also the handling of so-called 'parenthetical headwords' (phw tag) is awkward.
    They are in some ways similar to alternate headwords,.

These are some of the areas where I find the implementation choices I made now seem problematic to me. And they, along with other kinds of problems, are being encountered in the current fresh look at MW being made by @Andhrabharati.
So I think his revisions have considerable merit and hope many of the problems I have noticed will also be addressed.

But some of these issues are so fundamental that the development of a revision needs to take place in an environment separate from the environment used by the current digitization and displays. At some point, maybe a year from now, the revised version (maybe it will be developed in the 03 environment, or maybe in some other unrelated environment) will be brought to a level of perfection that it will be considered a better representation than the current form.
At that time, the 02 representation can be retired. But during the development phase, the current 02 form will be what is visible from the Cologne home page,
and we will have to be sure user corrections to MW are communicated to the 03 development team (Dhaval and Nagabhushana).

@funderburkjim
Copy link
Contributor Author

Another possibility for an independent development environment might be
a repository mw_dev based on the local installation of mw.

@Andhrabharati
Copy link

Andhrabharati commented Dec 29, 2022

These are some of the areas where I find the implementation choices I made now seem problematic to me. And they, along with other kinds of problems, are being encountered in the current fresh look at MW being made by @Andhrabharati.
So I think his revisions have considerable merit and hope many of the problems I have noticed will also be addressed.

But some of these issues are so fundamental that the development of a revision needs to take place in an environment separate from the environment used by the current digitization and displays.

Now you're trying to get my thoughts, @funderburkjim !! And I heartfully appreciate your anticipation (and suggestion) that a complete redo of coding might be better.

Just like to bring to your notice that Apte also liked this 'family of entries' concept in MW so much, that he too prepared his S2E dictionary in similar lines. [BTW, my current mw revision did not include this change; but it too can be covered.]

And, if you look at my Apte file posted for your study (earlier) has retained that 'concept', and it can be used as a trial text for this purpose.

There is another radical change that I wanted to bring into MW text (though not being covered in my current review/update-- as I thought of keeping the changes closely compatible with the current v02 framework, with minimal code changes), in line with the book. There is a SPL. CONVENTION in the print book that is an outcome of MW's mind, just like the devanagari 'u' superscript for (Rig-)Vedic udAtta by BR. I had mentioned about this few months back in one of my posts, though not explicitly.

Keep on throwing your thoughts further, before I "open up" myself again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New website features
Projects
None yet
Development

No branches or pull requests

3 participants