-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Life after faultfinder #114
Comments
To make sure all are covered I'll need to compile a list of 36 links myself. I'm thankful that you not only automated a lot, but actually did a big part of the proofreading itself. And not only that - documented. It takes twice the time to document it well. If the 'unified' framework will take less than a month, it sounds as a good idea (even hwnorm1 works really well). Other enhancements sound good, but seem to be rather complicated. And I do think that the web display is good enough for now, that the question is about the cleanness of the digital dictionaries. That 'alternate headwords' should be higher in the list. Let's forget API for now, it's a swamp 🐇 To 'internationalize' the displays means tuning the framework again, not a lot of lines, but I guess the whole strategy would have to be changed, all the language files should be held separately in that case. Without you the steam in the headword spelling factory will go down, but I hope that the next four months we still will get some more help from India. |
I second the suggestion by Jim that we should take a break from headword
corrrections and concentrate on displays and API. In case someone wants to
do proof reading, ovsO issue has the raw material available. That can be
seen and corrected. Installing from the corrected file would not take long.
This can live in github issues till Jim finds time to install corrections.
|
I guess the ovsO might be updated, now that the sources have been cleaned, what do you think? |
I will do so and update the output..
|
Thanks Dhaval. |
Updated data is available on https://github.com/drdhaval2785/SanskritSpellCheck/tree/master/o_vs_O/output1 |
I think that the faultfinder analysis of headword spellings (based on MW headword spellings) has now been applied to all the dictionaries. (Please mention if some dictionary has been missed.)
Assuming this is so, we are in position to consider what's next?
There are several ideas for approaching headword spelling error detection which have been suggested in recent months, and which deserve attention.
However, my inclination is to spend some time developing a 'unified' framework for dictionary displays.
The current system treats each dictionary as its own little world, separate from the worlds of all the other dictionaries. While this separate worlds organization has served us well, it is now providing obstacles to further development.
For one example, a recent development version of the MW basic display permits citation requests to be made in Devanagari or IAST. This seems to work well, and should be applied to ALL the dictionaries. But with 36 separate dictionary display programs to modify, this change would be quite awkward. This same set of options (and even perhaps some other options, such as Hyderabad's WX transliteration) should be available for all the displays -- in particular, I think the quite useful 'List display' should scrap the 'Preferences' method (with its complicated little keyboards) in favor of the same 'input' selection as the other dictionaries use.
Other enhancements - such have the input citation generate server suggestions - are also applicable to most dictionaries. We can also take into account some of the spelling differences (e.g. rxx variants) among
dictionaries that are so confusing (if I'm looking up 'agni' in dictionary 'X' do I spell it 'agni' or 'agniH'; do I spell gaNgA or gaMgA, karman or karmman, paricCeda or pariCeda, etc.)
We should be better able to deal with 'alternate headwords' (like the the VCP doubles).
With a properly organized framework, multidictionary displays could be be developed much more readily; only the experimental and quite limited hwnorm1
Also, by making a careful api, this will facilitate use be displays such as that Peter Scharf and Ralph Bunker have developed at sanskritlibrary.
Also, it might be useful and possible to 'internationalize' the displays (as has thus far only been partially done
for Stchoupak dictionary.
So, I'm going to take a break from dictionary spelling corrections to think about and hopefully implement some of the ideas above. It is probably appropriate to do this work under a new repository, though I don't have a name for it at the moment.
Although I anticipate changing my focus as described, I'll still be glad to help prepare materials with others who want to focus on some of the other headword spelling correction ideas.
The text was updated successfully, but these errors were encountered: