[BOUNTY - $200] Support MLX community models in tinygrad inference engine #200

AlexCheema · 2024-09-05T12:18:33Z

This is a follow up to [BOUNTY - $500] Add support for quantized models with tinygrad #148
In general model weights on huggingface are a bit of a mess because of different implementations in ML libraries. For example, tinygrad implementation of models name things slightly differently to MLX implementation, which names things slightly different to torch implementation
This means we need to have some code that "converts" these names / structure to the tinygrad one

Right now there's some code that already does this to convert from the huggingface torch implementation to tinygrad:

exo/exo/inference/tinygrad/models/llama.py

Lines 220 to 249 in 41f0a22

    
           def convert_from_huggingface(weights: Dict[str, Tensor], model: Transformer, n_heads: int, n_kv_heads: int): 
        
             def permute(v: Tensor, n_heads: int): 
        
               return v.reshape(n_heads, 2, v.shape[0] // n_heads // 2, v.shape[1]).transpose(1, 2).reshape(*v.shape[:2]) 
        
             keymap = { 
        
               "model.embed_tokens.weight": "tok_embeddings.weight", 
        
               **{f"model.layers.{l}.input_layernorm.weight": f"layers.{l}.attention_norm.weight" 
        
                  for l in range(len(model.layers))}, 
        
               **{f"model.layers.{l}.self_attn.{x}_proj.weight": f"layers.{l}.attention.w{x}.weight" 
        
                  for x in ["q", "k", "v", "o"] 
        
                  for l in range(len(model.layers))}, 
        
               **{f"model.layers.{l}.post_attention_layernorm.weight": f"layers.{l}.ffn_norm.weight" 
        
                  for l in range(len(model.layers))}, 
        
               **{f"model.layers.{l}.mlp.{x}_proj.weight": f"layers.{l}.feed_forward.w{y}.weight" 
        
                  for x, y in {"gate": "1", "down": "2", "up": "3"}.items() 
        
                  for l in range(len(model.layers))}, 
        
               "model.norm.weight": "norm.weight", 
        
               "lm_head.weight": "output.weight", 
        
             } 
        
             sd = {} 
        
             for k, v in weights.items(): 
        
               if ".rotary_emb." in k: continue 
        
               v = v.to(Device.DEFAULT) 
        
               if "model.layers" in k: 
        
                 if "q_proj" in k: 
        
                   v = permute(v, n_heads) 
        
                 elif "k_proj" in k: 
        
                   v = permute(v, n_kv_heads) 
        
               sd[keymap[k]] = v 
        
             return sd

. We just need something that can also deal with MLX community models e.g. https://huggingface.co/mlx-community/Meta-Llama-3.1-8B-Instruct-4bit

Note, you can look at how MLX does this here (you might be able to share a lot of code from there): https://github.com/ml-explore/mlx-examples/blob/bd29aec299c8fa59c161a9c1207bfc59db31d845/llms/mlx_lm/utils.py#L700

radenmuaz · 2024-11-17T13:21:29Z

Does this bounty also requires porting mlx modelling code to tinygrad? Since according to mlx-examples library, different models on mlx-community requires different modelling code.
exo currently only has llama. llama tinygrad modelling code is incompatible (different) with weights from qwen, etc.

https://github.com/ml-explore/mlx-examples/blob/bd6d910ca3744d75bf704e6e7039f97f71014bd5/llms/mlx_lm/utils.py#L81

though if models are ported from mlx to tinygrad, we don't need converter anymore.

AlexCheema changed the title ~~Support MLX community models in tinygrad inference engine~~ [BOUNTY - $100] Support MLX community models in tinygrad inference engine Sep 5, 2024

AlexCheema changed the title ~~[BOUNTY - $100] Support MLX community models in tinygrad inference engine~~ [BOUNTY - $200] Support MLX community models in tinygrad inference engine Sep 5, 2024

AlexCheema mentioned this issue Sep 5, 2024

[BOUNTY - $500] Add support for quantized models with tinygrad #148

Open

AlexCheema mentioned this issue Sep 24, 2024

Tinygrad quantization support #213

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BOUNTY - $200] Support MLX community models in tinygrad inference engine #200

[BOUNTY - $200] Support MLX community models in tinygrad inference engine #200

AlexCheema commented Sep 5, 2024 •

edited

Loading

radenmuaz commented Nov 17, 2024 •

edited

Loading

[BOUNTY - $200] Support MLX community models in tinygrad inference engine #200

[BOUNTY - $200] Support MLX community models in tinygrad inference engine #200

Comments

AlexCheema commented Sep 5, 2024 • edited Loading

radenmuaz commented Nov 17, 2024 • edited Loading

AlexCheema commented Sep 5, 2024 •

edited

Loading

radenmuaz commented Nov 17, 2024 •

edited

Loading