Introduce the scale enum flag in Embedding layer for LLM embedding. #909

ds-hwang · 2025-01-08T04:10:04Z

The activation component should roughly have a magnitude of 1. Since the embedding tensor is initialized with a scale of 1/sqrt(dim), the activation is multiplied by sqrt(dim) to maintain the desired scale. e.g. Gemma [1]
[1] https://github.com/google-deepmind/gemma/blob/0d6ae857591248422127ca14c027909546362e6a/gemma/modules.py#L80

In addition, unsloth [2] discovered that sqrt(dim) needs to be computed in float32. [2] Sec 3 in https://unsloth.ai/blog/gemma-bugs

TODO(axlearn-team): Use UNIT scale enum for AFM+. This will require re-sweeping hyperparameters (e.g., learning rate).

ds-hwang · 2025-01-08T04:10:22Z

@ruomingp Could you review? From 970

ruomingp · 2025-01-08T06:16:11Z

axlearn/common/layers.py

+        https://github.com/google-deepmind/gemma/blob/0d6ae857591248422127ca14c027909546362e6a/gemma/modules.py#L80
+        """
+
+        UNIT = 1


Nit: use str values to be more readable.

Suggested change

UNIT = 1

UNIT = "unit"

The activation component should roughly have a magnitude of 1. Since the embedding tensor is initialized with a scale of `1/sqrt(dim)`, the activation is multiplied by `sqrt(dim)` to maintain the desired scale. e.g. Gemma [1] [1] https://github.com/google-deepmind/gemma/blob/0d6ae857591248422127ca14c027909546362e6a/gemma/modules.py#L80 In addition, unsloth [2] discovered that `sqrt(dim)` needs to be computed in float32. [2] Sec 3 in https://unsloth.ai/blog/gemma-bugs TODO(axlearn-team): Use UNIT scale enum for AFM+. This will require re-sweeping hyperparameters (e.g., learning rate).

ds-hwang requested review from ruomingp, markblee and a team as code owners January 8, 2025 04:10

ruomingp reviewed Jan 8, 2025

View reviewed changes

ds-hwang requested a review from ruomingp January 8, 2025 15:17

ds-hwang force-pushed the emb_scale branch from 99c54d9 to 8167248 Compare January 8, 2025 15:17

ds-hwang enabled auto-merge January 8, 2025 19:17

ruomingp approved these changes Jan 8, 2025

View reviewed changes

ds-hwang added this pull request to the merge queue Jan 8, 2025

Merged via the queue into apple:main with commit 2d1fb29 Jan 8, 2025
6 checks passed

ds-hwang deleted the emb_scale branch January 8, 2025 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce the scale enum flag in Embedding layer for LLM embedding. #909

Introduce the scale enum flag in Embedding layer for LLM embedding. #909

ds-hwang commented Jan 8, 2025

ds-hwang commented Jan 8, 2025

ruomingp Jan 8, 2025

ds-hwang Jan 8, 2025

Introduce the scale enum flag in Embedding layer for LLM embedding. #909

Introduce the scale enum flag in Embedding layer for LLM embedding. #909

Conversation

ds-hwang commented Jan 8, 2025

ds-hwang commented Jan 8, 2025

ruomingp Jan 8, 2025

Choose a reason for hiding this comment

ds-hwang Jan 8, 2025

Choose a reason for hiding this comment