Issue with the implementation of Value Zeroing #266

hmohebbi · 2024-04-24T10:34:39Z

🐛 Bug Report

Hi team! I just experimented with Value Zeroing for the Gemma-2b model in Inseq to compare the outputs with my implementation. I found that Inseq outputs the same scores regardless of model layers. It seems the implementation for Gemma is broken, and this issue might be the case for other models, too!

🔬 How To Reproduce

Valid scores can be reproduced in this notebook: https://colab.research.google.com/drive/114YigbeMilvetmPStnlYR7Wd7gxWYFAX

My try from Inseq:

model = inseq.load_model("google/gemma-2b", "value_zeroing")
out = model.attribute("Either you win the game or you")
for l in range(18):
  out.show(select_idx=l)

The text was updated successfully, but these errors were encountered:

gsarti · 2024-04-24T11:12:38Z

Inverstigating this, thanks @hmohebbi!

gsarti · 2024-04-25T08:41:38Z

The code reported above was generating equal scores for all attributed tokens at every generation step across layers, like in the figure below:

The fact these scores are equal is due to normalization (by default out.show uses normalize=True to get relative contributions for every attributed token). Without normalization, the matrix would actually be filled with ~0s (hence why normalization blows up all the scores to ~the same value for every token).

So, why all 0s? Turns out Gemma and other LLaMA-based models currently default to the Pytorch native SDPA Attention implementation for fast inference. In contrast, our current Value Zeroing implementation supports only the eager mode at the moment. As a result, no value vectors were zeroed, and the dissimilarity metric resulted in 0 across the board since nothing is affected.

A simple fix for this is to force eager attention upon initialization:

model = inseq.load_model("google/gemma-2b", "value_zeroing", model_kwargs={"attn_implementation": "eager"})
out = model.attribute("Either you win the game or you")
for l in range(18):
  out.show(select_idx=l)

Now, you will get something more reasonable, e.g.:

I could reproduce the same patterns for Gemma produced using the context_mixing_toolkit.

TODOs:

Add a property for use_eager_attention for attribution method that would impose eager attention loading, or raise in case of other pre-specified values.

hmohebbi added the bug Something isn't working label Apr 24, 2024

gsarti mentioned this issue Apr 25, 2024

Support Value Zeroing for non-eager attention types #267

Merged

gsarti linked a pull request Apr 25, 2024 that will close this issue

Support Value Zeroing for non-eager attention types #267

Merged

gsarti closed this as completed in #267 Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with the implementation of Value Zeroing #266

Issue with the implementation of Value Zeroing #266

hmohebbi commented Apr 24, 2024

gsarti commented Apr 24, 2024

gsarti commented Apr 25, 2024

Issue with the implementation of Value Zeroing #266

Issue with the implementation of Value Zeroing #266

Comments

hmohebbi commented Apr 24, 2024

🐛 Bug Report

🔬 How To Reproduce

gsarti commented Apr 24, 2024

gsarti commented Apr 25, 2024