Issue with normalize_embeddings argument in encode() method #3200

BioMikeUkr · 2025-01-28T19:28:59Z

I noticed that the norms of embeddings for most models are 1, I thought at first that the default value of the normalize_embeddings argument in the encode() method is True, but I found that the default value of the normalize_embeddings is False. I found that if modules.json has a module of type sentence_transformers.models.Normalize (2_Normalize), the output embeddings will always be normalized, regardless of the value of the normalize_embeddings argument. This happens because the forward() method, includes all the loaded modules.

The text was updated successfully, but these errors were encountered:

tomaarsen · 2025-01-29T12:02:08Z

Hello!

Indeed, some models force the normalization in the architecture itself. In that case, it can't be turned off.
I don't really know why this is a big problem, though. I think most embeddings should be normalized - they're simply easier and cheaper to work with (e.g. you can use dot product directly to compute similarities).

I'm not sure how I feel about optionally disabling modules based on the normalize_embeddings argument, primarily because I don't know a use case where you really don't want normalized embeddings. I'm open to your thoughts on this! If it's important for something, then I'll definitely consider your proposal.

Tom Aarsen

werent4 · 2025-01-29T12:29:16Z

Hello!
For the sake of clarity for end-users, I believe the normalize_embeddings parameter might be misleading in this case, as some models completely ignore it. This could create confusion, making users think they have control over normalization when they actually don't.

BioMikeUkr mentioned this issue Jan 28, 2025

Fix missing normalize_embeddings argument #3201

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with normalize_embeddings argument in encode() method #3200

Issue with normalize_embeddings argument in encode() method #3200

BioMikeUkr commented Jan 28, 2025

tomaarsen commented Jan 29, 2025 •

edited

Loading

werent4 commented Jan 29, 2025 •

edited

Loading

Issue with normalize_embeddings argument in encode() method #3200

Issue with normalize_embeddings argument in encode() method #3200

Comments

BioMikeUkr commented Jan 28, 2025

tomaarsen commented Jan 29, 2025 • edited Loading

werent4 commented Jan 29, 2025 • edited Loading

tomaarsen commented Jan 29, 2025 •

edited

Loading

werent4 commented Jan 29, 2025 •

edited

Loading