Skip to content

About English ngrams #78

Answered by dariogoetz
fohrloop asked this question in Q&A
Oct 7, 2024 · 1 comments · 5 replies
Discussion options

You must be logged in to vote

Thanks for the detailed comparison :)
You are correct for the "oxey" corpora. They come from the english.json and english2.json files at your linked page.

If I recall correctly, the "eng_shai" corpus is the iweb corpus that was used to develop the "colemak" layout. It was added in 2022, so my memory may be incorrect, though.

I don't know, which sources oxey's two english corpora are based upon. It may very well be the "shai" corpus. I used the oxey corpora mainly to be able to compare the oxey-metrics of this analyzer to the ones from oxey's playground and make sure, they are aligned.

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@fohrloop
Comment options

@fohrloop
Comment options

@dariogoetz
Comment options

@fohrloop
Comment options

@fohrloop
Comment options

Answer selected by fohrloop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants