Life after faultfinder #114

funderburkjim · 2015-05-16T21:20:18Z

I think that the faultfinder analysis of headword spellings (based on MW headword spellings) has now been applied to all the dictionaries. (Please mention if some dictionary has been missed.)

Assuming this is so, we are in position to consider what's next?

There are several ideas for approaching headword spelling error detection which have been suggested in recent months, and which deserve attention.

However, my inclination is to spend some time developing a 'unified' framework for dictionary displays.

The current system treats each dictionary as its own little world, separate from the worlds of all the other dictionaries. While this separate worlds organization has served us well, it is now providing obstacles to further development.

For one example, a recent development version of the MW basic display permits citation requests to be made in Devanagari or IAST. This seems to work well, and should be applied to ALL the dictionaries. But with 36 separate dictionary display programs to modify, this change would be quite awkward. This same set of options (and even perhaps some other options, such as Hyderabad's WX transliteration) should be available for all the displays -- in particular, I think the quite useful 'List display' should scrap the 'Preferences' method (with its complicated little keyboards) in favor of the same 'input' selection as the other dictionaries use.

Other enhancements - such have the input citation generate server suggestions - are also applicable to most dictionaries. We can also take into account some of the spelling differences (e.g. rxx variants) among
dictionaries that are so confusing (if I'm looking up 'agni' in dictionary 'X' do I spell it 'agni' or 'agniH'; do I spell gaNgA or gaMgA, karman or karmman, paricCeda or pariCeda, etc.)

We should be better able to deal with 'alternate headwords' (like the the VCP doubles).

With a properly organized framework, multidictionary displays could be be developed much more readily; only the experimental and quite limited hwnorm1

Also, by making a careful api, this will facilitate use be displays such as that Peter Scharf and Ralph Bunker have developed at sanskritlibrary.

Also, it might be useful and possible to 'internationalize' the displays (as has thus far only been partially done
for Stchoupak dictionary.

So, I'm going to take a break from dictionary spelling corrections to think about and hopefully implement some of the ideas above. It is probably appropriate to do this work under a new repository, though I don't have a name for it at the moment.

Although I anticipate changing my focus as described, I'll still be glad to help prepare materials with others who want to focus on some of the other headword spelling correction ideas.

gasyoun · 2015-05-16T21:48:20Z

To make sure all are covered I'll need to compile a list of 36 links myself. I'm thankful that you not only automated a lot, but actually did a big part of the proofreading itself. And not only that - documented. It takes twice the time to document it well. If the 'unified' framework will take less than a month, it sounds as a good idea (even hwnorm1 works really well). Other enhancements sound good, but seem to be rather complicated. And I do think that the web display is good enough for now, that the question is about the cleanness of the digital dictionaries. That 'alternate headwords' should be higher in the list. Let's forget API for now, it's a swamp 🐇 To 'internationalize' the displays means tuning the framework again, not a lot of lines, but I guess the whole strategy would have to be changed, all the language files should be held separately in that case. Without you the steam in the headword spelling factory will go down, but I hope that the next four months we still will get some more help from India.

drdhaval2785 · 2015-05-17T00:30:09Z

I second the suggestion by Jim that we should take a break from headword corrrections and concentrate on displays and API. In case someone wants to do proof reading, ovsO issue has the raw material available. That can be seen and corrected. Installing from the corrected file would not take long. This can live in github issues till Jim finds time to install corrections.

gasyoun · 2015-05-17T05:27:18Z

I guess the ovsO might be updated, now that the sources have been cleaned, what do you think?

drdhaval2785 · 2015-05-17T05:41:26Z

I will do so and update the output..

gasyoun · 2015-05-17T20:03:06Z

Thanks Dhaval.

drdhaval2785 · 2015-05-28T03:29:22Z

Updated data is available on https://github.com/drdhaval2785/SanskritSpellCheck/tree/master/o_vs_O/output1

drdhaval2785 closed this as completed May 28, 2015

drdhaval2785 mentioned this issue Nov 17, 2015

NO CHANGE list #153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Life after faultfinder #114

Life after faultfinder #114

funderburkjim commented May 16, 2015

gasyoun commented May 16, 2015

drdhaval2785 commented May 17, 2015 via email

gasyoun commented May 17, 2015

drdhaval2785 commented May 17, 2015 via email

gasyoun commented May 17, 2015

drdhaval2785 commented May 28, 2015

Life after faultfinder #114

Life after faultfinder #114

Comments

funderburkjim commented May 16, 2015

gasyoun commented May 16, 2015

drdhaval2785 commented May 17, 2015 via email

gasyoun commented May 17, 2015

drdhaval2785 commented May 17, 2015 via email

gasyoun commented May 17, 2015

drdhaval2785 commented May 28, 2015