A dictionary of language names of more than 200 languages in more than 200 languages and a list of language names in English with their counterpart in the respective language.
The data was scraped from www.jw.org. The encoding is utf-8.
The easiest way to use it is to load the pickle file in python:
import pickle
fh = open('dictionary_of_language_names.pickle', 'rb')
dic = pickle.load(fh)
dic['de']['fr']
Output: u'Franz\xf6sisch'
Keywords: NLP, Natural Language Processing, Machine Translation, glossary, language list, python, data, dictionary, multi-language dictionary, English, languages, rare languages
List of languages:
Kikuyu, Cambodian, Sarnami, Khoekhoegowab, Haitian, Tojolabal, Croatian, Herero, Serbian, Waray-Waray, Kikaonde, Kongo, Lhukonzo, Ukrainian, Kabyle, Aymara, Toba, language_id, Huastec, Hindi, Nzema, Ndonga, language_in_English, Kwanyama, Punjabi, Faeroese, Georgian, Greenlandic, Miskito, Hungarian, Efik, Rapa, Bicol, Afrikaans, Armenian, Wolaita, Huave, Kikamba, Hausa, Amharic, Norwegian, Kisonge, Nahuatl, Nepali, Solomon, Kirghiz, Ndebele, Rarotongan, Korean, Zapotec, Baoule, Dutch, Aukan, Tlapanec, Pangasinan, Russian, Ewe, Mauritian, Uruund, Voru, Spanish, Greek, Lingala, Estonian, Vezo, Rumanyo, Luganda, Lithuanian, Mayo, Lamba, Yoruba, Sepedi, Kinyarwanda, Dangme, Tarascan, Rutoro, Wayuunaiki, Mixtec, Cebuano, Sango, Quechua, Tiv, Guna, Maya, Sranantongo, Acholi, Xhosa, Tetum, Krio, Isoko, Oromo, Saramaccan, Quichua, Azerbaijani, Chin, Romanian, Bulgarian, Quiche, Kalenjin, Lahu, Mbunda, Swati, Tsonga, Kisi, Cinyanja, Shona, Newari, Slovenian, Zande, Tamil, Mambwe-Lungu, Icelandic, Runyankore, Guarani, Mongolian, Kiluba, Tagalog, Ngabere, Kwangali, Italian, Iloko, Portuguese, Hiligaynon, Tzotzil, Myanmar, Ateso, Finnish, Welsh, Zulu, Chol, Sesotho, Otomi, Mazatec, Tajiki, Kekchi, Mam, Lenje, Kalanga, Somali, Tarahumara, Vietnamese, Gujarati, Marathi, French, Sinhala, Frafra, Indonesian, Luvale, Tongan, Danish, Japanese, Niuean, Polish, Tankarana, Czech, Chinese, Garifuna, Maltese, Venda, Nuer, Sidama, Mayangna, Thai, Swahili, Luo, English, Latvian, Kazakh, Irish, Macedonian, Mixe, Tahitian, Samoan, Lugbara, Tatar, Albanian, Turkish, Totonac, Tswana, Slovak, Swedish, Ga, Kurdish, Twi, Tokelauan, Malagasy, Papiamento, Romany, Tzeltal, Cakchiquel, Wichi, German, Igbo, Ossetian, Mapudungun, Tigrinya, Seychelles
Native names:
Acholi, Afrikaans, Ateso, Aymara, Azərbaycan, Azərbaycan (кирилҹә), Baoulé, Bicol, cakchiquel, Cebuano, Chol, Cilenje, Cinyanja, Créole Mauricien, Cymraeg, Dangme, Dansk, Deutsch, eesti, Efịk, English, español, Ewe, Faka-Niue, Faka-Tokelau, Frafra, Français, Føroyskt, Ga, Gaeilge, Garifuna, guaraní, guna, Gĩkũyũ, Hausa, Herero, Hiligaynon, hrvatski, huave, Icilamba, Igbo, Ikinyarwanda, Iloko, Indonesia, IsiNdebele, IsiXhosa, IsiZulu, Isoko, italiano, Kabyle, Kalaallisut, Kalanga, Kalenjin, kekchí, Khoekhoegowab, Kikamba, Kikaonde, Kikongo (Rép. dém. du congo), Kiluba, Kisiei, Kisonge, Kiswahili, Kiswahili cha Congo-Kinshasa, Kreol Seselwa, Kreyòl ayisyen, Krio, K’urdî Kurmancî (Kavkazûs), Lai Holh (Hakha), Latviešu, Laˇhuˍ hkawˇ, Lhukonzo, lietuvių, Lingala, Luganda, Lugbara, Luo, Luvale, Luvenda, magyar, Malagasy, Malti, mam, Mambwe-Lungu, mapudungún, maya, mayangna, mayo (yoremnok´ki), mazateco de Huautla, Mbunda, miskito, mixe, mixteco de guerrero (tu̱ʼun saví), Mixteco de Huajuapan, Ndonga, Nederlands, ngäbere, Norsk, Nzema, náhuatl de guerrero, náhuatl de la huasteca, náhuatl del centro, náhuatl del norte de puebla, Okanisitongo, Oromoo, Oshikwanyama, Otomí, Pangasinan, Papiamentu (Kòrsou), pilagá, polski, Português, purépecha, Quechua (Ancash), Quechua (Bolivia), quechua (cusco), quechua ayacuchano, quichua, quichua (santiago del estero), quiché, rapa nui, Reo Rarotonga, Romane (Makedonija), Română, Rukwangali, Rumanyo, Runyankore, Rutoro, Saamakatöngö, Samoan, Sango, Sarnami, Sepedi, Sesotho, Setswana, Shona, shqip, Sidaamu Afoo, SiSwati, slovenčina, slovenščina, Solomon Islands Pidgin, Somali, Sranantongo, srpski (latinica), suomi, Svenska, Tagalog, Tahiti, Tankarana, tarahumara, Tetun, Thok Nath, Tiv, Tlapaneco, toba, tojolabal, Tongan, totonaco, tseltal, tsotsil, Twi, tének, Türkçe, Uruund, Vezo, Việt Nam, võro, Waray-Waray, Wayuunaiki, wichi, Wolayttattuwaa, Xitsonga, Yorùbá, Zande, zapoteco de Guevea, zapoteco del istmo, Zapoteco Lachiguiri, íslenska, čeština, Ελληνική, К′öрди Кöрманщи (Кирили), български, ирон, кыргыз, македонски, монгол, русский, српски, татар, тоҷикӣ, українська, қазақ, Արեւմտահայերէն, Հայերեն, नेपाली, नेवारी, मराठी, हिंदी, ਪੰਜਾਬੀ, ગુજરાતી, தமிழ், සිංහල, ไทย, မြန်မာ, ქართული, ትግርኛ, አማርኛ, ខ្មែរ, 日本語, 汉语简化字, 漢語繁體字, 한국어,