Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOUNTY - $200] Support MLX community models in tinygrad inference engine #200

Open
AlexCheema opened this issue Sep 5, 2024 · 1 comment

Comments

@AlexCheema
Copy link
Contributor

AlexCheema commented Sep 5, 2024

  • This is a follow up to [BOUNTY - $500] Add support for quantized models with tinygrad #148
  • In general model weights on huggingface are a bit of a mess because of different implementations in ML libraries. For example, tinygrad implementation of models name things slightly differently to MLX implementation, which names things slightly different to torch implementation
  • This means we need to have some code that "converts" these names / structure to the tinygrad one
  • Right now there's some code that already does this to convert from the huggingface torch implementation to tinygrad:
    def convert_from_huggingface(weights: Dict[str, Tensor], model: Transformer, n_heads: int, n_kv_heads: int):
    def permute(v: Tensor, n_heads: int):
    return v.reshape(n_heads, 2, v.shape[0] // n_heads // 2, v.shape[1]).transpose(1, 2).reshape(*v.shape[:2])
    keymap = {
    "model.embed_tokens.weight": "tok_embeddings.weight",
    **{f"model.layers.{l}.input_layernorm.weight": f"layers.{l}.attention_norm.weight"
    for l in range(len(model.layers))},
    **{f"model.layers.{l}.self_attn.{x}_proj.weight": f"layers.{l}.attention.w{x}.weight"
    for x in ["q", "k", "v", "o"]
    for l in range(len(model.layers))},
    **{f"model.layers.{l}.post_attention_layernorm.weight": f"layers.{l}.ffn_norm.weight"
    for l in range(len(model.layers))},
    **{f"model.layers.{l}.mlp.{x}_proj.weight": f"layers.{l}.feed_forward.w{y}.weight"
    for x, y in {"gate": "1", "down": "2", "up": "3"}.items()
    for l in range(len(model.layers))},
    "model.norm.weight": "norm.weight",
    "lm_head.weight": "output.weight",
    }
    sd = {}
    for k, v in weights.items():
    if ".rotary_emb." in k: continue
    v = v.to(Device.DEFAULT)
    if "model.layers" in k:
    if "q_proj" in k:
    v = permute(v, n_heads)
    elif "k_proj" in k:
    v = permute(v, n_kv_heads)
    sd[keymap[k]] = v
    return sd
    . We just need something that can also deal with MLX community models e.g. https://huggingface.co/mlx-community/Meta-Llama-3.1-8B-Instruct-4bit
  • Note, you can look at how MLX does this here (you might be able to share a lot of code from there): https://github.com/ml-explore/mlx-examples/blob/bd29aec299c8fa59c161a9c1207bfc59db31d845/llms/mlx_lm/utils.py#L700
@AlexCheema AlexCheema changed the title Support MLX community models in tinygrad inference engine [BOUNTY - $100] Support MLX community models in tinygrad inference engine Sep 5, 2024
@AlexCheema AlexCheema changed the title [BOUNTY - $100] Support MLX community models in tinygrad inference engine [BOUNTY - $200] Support MLX community models in tinygrad inference engine Sep 5, 2024
@radenmuaz
Copy link

radenmuaz commented Nov 17, 2024

Does this bounty also requires porting mlx modelling code to tinygrad? Since according to mlx-examples library, different models on mlx-community requires different modelling code.
exo currently only has llama. llama tinygrad modelling code is incompatible (different) with weights from qwen, etc.

https://github.com/ml-explore/mlx-examples/blob/bd6d910ca3744d75bf704e6e7039f97f71014bd5/llms/mlx_lm/utils.py#L81

though if models are ported from mlx to tinygrad, we don't need converter anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants