-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
With or without linebreaks #419
Comments
I am glad that your very first 'keen' attempt of looking into my file(s) 'prompted' you to think of changing the 'stand' (that stood for many years now). I know (for sure) that Jim could add few more Cons to your list and I have many more (but that is not worth spending my time at). And apart from 'leaving away' the line-breaks, I make several other 'important' structural changes in my files. Coming to retaining the line-breaks, I think they should be retained at "verse blocks" in VCP and SKD, that span into multiple columns (and even multiple pages) many a time. Reading such long unbroken matter would be a bad experience, as the reader's mind now, more or less, is 'tuned' to the "semantic breaks" introduced in printing. But within the 'prose' paragraphs, they can be got away with. Now coming to the Pros that you had listed-- As I understand, the need for looking at the print dictionary comes up, to compare the digital text, mostly for correcting the errors being reported by the users (or otherwise). How is it being dealt in the case of MW, that is the mostly reported work [I would roughly estimate it to be 90-95% in the user feedback], whose digital text does not contain the line-breaks and also has deviated (a bit too-)much from the print (except for having the page-column info in-tact)? |
Speaking of the MW digital text format, I thought I should 'leak' that my current working 'prompted' me to make some major structural changes in it, some of them moving closer to print matter. I am sure that this would create some hiccups, if (and when) I post my MW work. |
Any thoughts @funderburkjim? |
And double the size of each dictionary?
I'm for it. But that would take years for just this one task and stop all the others, is it worth now?
exactly
right |
On what basis did you arrive at this, Marcis? I have been doing this (removing or alt. marking the line-breaks) in just few minutes in each of the CDSL dictionary, that I work upon! |
In AB's [revision to MD](sanskrit-lexicon/csl-orig@2dffafb dictionary), he introduced the
make_xml.py can 'ignore' this character, so it doesn't get in the way of displays. This seems like a good solution. As a general point, I think that preservation of line breaks have served there purpose. Then, when I came to make displays for these later dictionaries, I thought it was best to preserve line breaks in the displays to aid in correction investigation. We are now in process of making major revisions to these original forms -- adding markup, tooltips, links, etc. so the dictionary displays more useful. These changes also provide the basis for future NLP-type work with the dictionary corpora (e.g. DAtu extraction). So line-break preservation is no longer as useful as it once was. For some dictionaries (Burnouf and Apte90 come to mind), I used a Current opinion: For cdsl dictionaries where line-breaks currently preserved, use the special character. But feel free to use multiline forms in the xxx.txt (e.g. at the
|
@Andhrabharati Do you convert line-breaks (`\n') to
Also, how do you handle end-of-line hyphens? |
No - if you're using xml, use a (specially defined) xml tag and not some adhoc special-meaning-symbols. |
Dear all,
This issue has been going on in my mind for long.
In many CDSL dictionaries, we have line breaks as per printed dictionaries. In many, we don't.
This issue is devoted to deciding the usefulness or otherwise of line breaks
Pros
Cons
Should we change from line breaks to sans line breaks?
The question deserves attention, because @Andhrabharati submits his major corrections in the later format. If we agree in principle to go with that format, we won't have hassle of analysing diffs and spending a lot of time.
what about invertibity?
We can have a json with lnum as key. It will hold "old" and "new" text blobs. So, it will be possible to go back and forth.
In case we made some change to our new data, diff can be found out and the same can be carried to old one, if wished.
Only the changes at line ends will not be carried back computationally. It will have to be handled manually.
Historical experience
We had made a quantum jump when we moved from Anglicised Sanskrit to IAST / metaline. We also had invertibility principle then.
But in practice, no one has ever shown any interest to carry back changes made to IAST version to AS version. Same may happen here. Much ado about nothing.
View
My view is that we should do away with line breaks.
What do others say?
The text was updated successfully, but these errors were encountered: