Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding-support-for-mamba2 #1009

Open
wants to merge 75 commits into
base: main
Choose a base branch
from

Conversation

Goekdeniz-Guelmez
Copy link
Contributor

No description provided.

@Goekdeniz-Guelmez Goekdeniz-Guelmez changed the title Create mamba2.py adding-support-for-mamba2 Oct 2, 2024
@hg0428
Copy link

hg0428 commented Oct 22, 2024

Codestral Mamba and other models rely on the Mamba2 architecture. Hopefully we can get this soon.

@Goekdeniz-Guelmez
Copy link
Contributor Author

Goekdeniz-Guelmez commented Jan 20, 2025

i think it has something to do with the codestral repo on hf, the layers are not converted correctly. I'll try that later when im home.

Goekdeniz-Guelmez and others added 4 commits January 21, 2025 10:57
…odestral. working:

rokyang/mamba2-130m-hf
rokyang/mamba2-370m-hf
rokyang/mamba2-780m-hf
rokyang/mamba2-1.3b-hf
rokyang/mamba2-2.7b-hf
python -m mlx_lm.generate --model /Users/gokdenizgulmez/Desktop/Mamba-Codestral-7B-v0.1-4bit --prompt "# A function that computes fibonacci
def fibonacci(" -m 64
==========
n):
    print(f"{os.path.abspath(".")/data/data/data/com.android.launcher.png)

## 🙌🏼 🙌🙌🙌🙌🙌🙌

class _State(Enum):
    def __init__ (self
==========
Prompt: 16 tokens, 84.547 tokens-per-sec
Generation: 64 tokens, 13.774 tokens-per-sec
Peak memory: 4.139 GB
@Goekdeniz-Guelmez
Copy link
Contributor Author

Goekdeniz-Guelmez commented Jan 21, 2025

Hey @awni, I finished it with mamba-codestral, I will push the quantised version up but you can also use mistralai/Mamba-Codestral-7B-v0.1:

python -m mlx_lm.generate --model /Users/gokdenizgulmez/Desktop/Mamba-Codestral-7B-v0.1-4bit --prompt "Rene Descartes was" -m 12 
==========
a French surrealist painting, 2016
==========
Prompt: 7 tokens, 38.813 tokens-per-sec
Generation: 12 tokens, 14.927 tokens-per-sec
Peak memory: 4.122 GB

Ps. There is no prompt format though.

@hey-tommy
Copy link

@Goekdeniz-Guelmez Thanks for all your hard work on this!

@awni
Copy link
Member

awni commented Feb 3, 2025

@Goekdeniz-Guelmez I tried both codestral and mamba2 2.7B. Both models generate pretty bad responses.. the 2.7B doesn't really work at all even in 8-bit... so I think there must be a bug there.

The codestral one can generate text but doesn't seem to be able to end correctly. Is that your experience?

@Goekdeniz-Guelmez
Copy link
Contributor Author

Goekdeniz-Guelmez commented Feb 3, 2025

Yes, I’ve tried it again with a max generations number and got the same problems, I’ll look into it tomorrow.

@Goekdeniz-Guelmez
Copy link
Contributor Author

I think it is somewhere in the ssm computation that I got wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants