Fine tuning ColPali to make it multilingual #163

AndRossi · 2025-01-02T20:53:34Z

Hey there! First of all I wanted to thank you for your work. I'm a big fan of your ColPali and ColQwen models, and the fact that not only you open sourced them, but you also released the code and your whole training set, is an immense gift to the community.

I am reaching out because I wanted to ask your opinion on something. I would like to use ColPali/ColQwen2 in a multilingual RAG scenario. Of course your training set is English only, so the model will not perform particularly well at multilingual tasks. Hence, I was thinking of doing some fine-tuning on multilingual data, maybe starting with just a few languages like EN, IT, FR, ES.

I wanted to ask you if you think it is a reasonable idea, and if you have any insights on the order of magnitude of samples that I would need to gather. I know your original training set had around 130k samples, and I was hoping that, for a fine-tuning, maybe something between 1k and 10k query-page pairs would be enough.

Do you have any insights about this? Or any general suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuning ColPali to make it multilingual #163

Fine tuning ColPali to make it multilingual #163

AndRossi commented Jan 2, 2025 •

edited

Loading

Fine tuning ColPali to make it multilingual #163

Fine tuning ColPali to make it multilingual #163

Comments

AndRossi commented Jan 2, 2025 • edited Loading

AndRossi commented Jan 2, 2025 •

edited

Loading