Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add absolute confidence metric #413

Open
pemistahl opened this issue Dec 6, 2024 · 5 comments · May be fixed by #419
Open

Add absolute confidence metric #413

pemistahl opened this issue Dec 6, 2024 · 5 comments · May be fixed by #419

Comments

@pemistahl
Copy link
Owner

Currently, the library only provides a relative confidence metric that tells you how likely a language is in comparison to another language. It is desirable to have an additional absolute confidence metric that works with a single language only and independently from any other language. With such an absolute confidence metric, a LanguageDetector instance could be built from a single language. This instance would then be able to provide binary decisions, i.e. tell whether some text is written in a specific language or not.

An absolute confidence metric could be based on unique or the most common n ngrams of a language.

@hemju
Copy link

hemju commented Jan 31, 2025

@pemistahl I just found this library, and it looks awesome. The new feature seems really useful. Basically, I just wanted to say thank you for the hard work.

@pemistahl
Copy link
Owner Author

Thank you very much @hemju for the kind words. Really appreciated. :) This new feature is nearly completed. I hope that I will be able to create a new release within the next two weeks at the latest.

@hemju
Copy link

hemju commented Jan 31, 2025

I hope that I will be able to create a new release within the next two weeks at the latest.

That sounds great 🥳 👏 Thank you again.

I have one question (I am not sure if this is the right way to ask), but I am not sure about initialization (also not in the Java version; this is the version that brought me here. I will stick with the RUST version though). Is a LanguageDetector thread safe? Should it be reused? Or should it be initialized for each call? I understand that the underlying LanguageModels are kept in the Memory, but what about the detector?

Thanks in advance.

@pemistahl
Copy link
Owner Author

pemistahl commented Jan 31, 2025

I will answer that once and for all in the documentation of the next release, but: Yes, the library is thread-safe. That includes instances of LanguageDetector as well. Just create a single instance and reuse it in multiple threads. Only create multiple instances if the settings for each instance shall be different, e.g. different language sets to recognize.

@hemju
Copy link

hemju commented Jan 31, 2025

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants