Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Life after faultfinder #114

Closed
funderburkjim opened this issue May 16, 2015 · 6 comments
Closed

Life after faultfinder #114

funderburkjim opened this issue May 16, 2015 · 6 comments

Comments

@funderburkjim
Copy link
Contributor

I think that the faultfinder analysis of headword spellings (based on MW headword spellings) has now been applied to all the dictionaries. (Please mention if some dictionary has been missed.)

Assuming this is so, we are in position to consider what's next?

There are several ideas for approaching headword spelling error detection which have been suggested in recent months, and which deserve attention.

However, my inclination is to spend some time developing a 'unified' framework for dictionary displays.

The current system treats each dictionary as its own little world, separate from the worlds of all the other dictionaries. While this separate worlds organization has served us well, it is now providing obstacles to further development.

For one example, a recent development version of the MW basic display permits citation requests to be made in Devanagari or IAST. This seems to work well, and should be applied to ALL the dictionaries. But with 36 separate dictionary display programs to modify, this change would be quite awkward. This same set of options (and even perhaps some other options, such as Hyderabad's WX transliteration) should be available for all the displays -- in particular, I think the quite useful 'List display' should scrap the 'Preferences' method (with its complicated little keyboards) in favor of the same 'input' selection as the other dictionaries use.

Other enhancements - such have the input citation generate server suggestions - are also applicable to most dictionaries. We can also take into account some of the spelling differences (e.g. rxx variants) among
dictionaries that are so confusing (if I'm looking up 'agni' in dictionary 'X' do I spell it 'agni' or 'agniH'; do I spell gaNgA or gaMgA, karman or karmman, paricCeda or pariCeda, etc.)

We should be better able to deal with 'alternate headwords' (like the the VCP doubles).

With a properly organized framework, multidictionary displays could be be developed much more readily; only the experimental and quite limited hwnorm1

Also, by making a careful api, this will facilitate use be displays such as that Peter Scharf and Ralph Bunker have developed at sanskritlibrary.

Also, it might be useful and possible to 'internationalize' the displays (as has thus far only been partially done
for Stchoupak dictionary.

So, I'm going to take a break from dictionary spelling corrections to think about and hopefully implement some of the ideas above. It is probably appropriate to do this work under a new repository, though I don't have a name for it at the moment.

Although I anticipate changing my focus as described, I'll still be glad to help prepare materials with others who want to focus on some of the other headword spelling correction ideas.

@gasyoun
Copy link
Member

gasyoun commented May 16, 2015

To make sure all are covered I'll need to compile a list of 36 links myself. I'm thankful that you not only automated a lot, but actually did a big part of the proofreading itself. And not only that - documented. It takes twice the time to document it well. If the 'unified' framework will take less than a month, it sounds as a good idea (even hwnorm1 works really well). Other enhancements sound good, but seem to be rather complicated. And I do think that the web display is good enough for now, that the question is about the cleanness of the digital dictionaries. That 'alternate headwords' should be higher in the list. Let's forget API for now, it's a swamp 🐇 To 'internationalize' the displays means tuning the framework again, not a lot of lines, but I guess the whole strategy would have to be changed, all the language files should be held separately in that case. Without you the steam in the headword spelling factory will go down, but I hope that the next four months we still will get some more help from India.

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented May 17, 2015 via email

@gasyoun
Copy link
Member

gasyoun commented May 17, 2015

I guess the ovsO might be updated, now that the sources have been cleaned, what do you think?

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented May 17, 2015 via email

@gasyoun
Copy link
Member

gasyoun commented May 17, 2015

Thanks Dhaval.

@drdhaval2785
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants