-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink this whole endeavour #134
Comments
Thank you for your thoughtful criticism of the flashcard project. You raise important points about language learning authenticity that deserve careful consideration. As someone born and bred in Wales, I intimately understand the challenges of minority language learning. When trying to build my Welsh vocabulary, I struggled to find resources tailored to me (especially after getting a basic grasp of the language) - resources available would in particular lack audio content, or if they did have audio content, they would be far too difficult. This personal experience was one motivation for creating these tools. This raises an important point about your concern regarding English as the input language. For a native English speaker beginning to learn Welsh, how else would they start? The reality is that most learners need a bridge from their native language to their target language. While resources in the target language are valuable as learners progress, they need an entry point from their known language. The research on this topic is nuanced. While you're absolutely right that AI-generated content isn't perfect, recent studies show modern neural TTS systems achieving ~95% naturalness ratings (although for lower resource languages, like Basque, it would be worse and would sound robotic). These decks are designed as supplementary tools for beginners, not replacements for human interaction. They aim to provide accessible entry points and support initial vocabulary acquisition through spaced repetition - something Anki excels at. I'm explicitly transparent about the AI generation in each deck's description, detailing the specific models used (Google Cloud Translation, Azure/Google TTS) and noting potential biases. Your point about minority languages is valuable. An acquaintance living in a Basque-speaking region faces similar challenges: finding resources to bridge the gap between beginner materials, and understanding locals and a work colleague with a Serbian family wanted to expand her vocab in Serbian - but had no easy way to do this. These situations highlight why we need more resources; not everyone can afford one-to-one native speaker lessons. I'd be interested to see any research supporting the statement that such tools "do more harm than good". I've not come across anything on Google Scholar from some brief research, although there are plenty of researchers looking at using AI to improve language acquisition. My experience suggests that giving beginners tools to build initial confidence, even if imperfect, can encourage real-world speaking attempts where native speakers can provide corrective feedback. This scaffold approach might actually help preserve minority languages by lowering the barrier to entry for new learners. The decks will remain available, but I look forward to seeing better ones. People providing alternative decks with native speaker content would be the gold standard, and I would be more than happy to direct people there from the flashcard homepage, when the decks become available. |
The sentence pairs available on Tatoeba though are a good find, thank you, and I suspect a large improvement over automated translation (which would more likely cover written rather than spoken phrases). If it's possible to export these into Anki flash decks, and provide images, this could be a good addition, and likely an improvement over the naturalness of the phrases? |
tl;dr: Please consider removing these decks. Even though you intend well, the outcome is not only unsatisfactory but also damaging.
This ‘AI’ stuff does more harm than good for language learning. The source materials are in English, and they are automatically translated into a whole bunch of other languages using machine translation. That means anyone who learns with such decks is not only bound to sound unnatural with regard to pronunciation and prosody because of the ‘AI’-generated audio, but also their vocabulary, syntax and way they express themselves are all guaranteed to be highly unidiomatic.
I realise you created the code in this repo in good faith. It’s obvious a lot of work has been devoted here to develop the system, and the intention is clearly good — this is very appreciated. However, decks like these are not only harmful to individual language learners, who deceive themselves thinking they improve their skills in the target language, but also to speech communities, especially those of minoritised languages such as Welsh or Basque, when L2 speakers speak in a degraded Anglicised pseudo-L2 language. The thought one can learn anything beneficial in a target language from the output of machine translation of English-language sentences (and ‘AI’-generated audio) is something I cannot describe but as patronising, in a very Anglocentric way.
I hope the honest criticism is received well and it doesn’t dishearten you. If you wish to contribute to a project that is more respectful of minoritised languages and their speakers, please consider Tatoeba.
The text was updated successfully, but these errors were encountered: