Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In v0.4.0, PretrainedConfig.get_config_dict fails for TinyStories-Instruct-2Layers-33M #322

Open
PhilipQuirke opened this issue Feb 4, 2025 · 5 comments

Comments

@PhilipQuirke
Copy link

Our "TinySQL" project uses nnsight to investigate "text to SQL" models using as base models:

  • roneneldan/TinyStories-Instruct-2Layers-33M
  • Qwen/Qwen2.5-0.5B-Instruct
  • withmartian/Llama-3.2-1B-Instruct

The following code worked with nnsight v0.3.7 for all three models:

with model.generate(inputs['input_ids'], max_new_tokens=10, pad_token_id=model.tokenizer.eos_token_id) as tracer:
	final_output = model.generator.output.save()

The same code fails in v0.4.0 for the TinyStories model (only) with an endless recursive copy starting in:

File "/usr/local/lib/python3.11/dist-packages/transformers/models/auto/configuration_auto.py", line 1021, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)

FYI, we have developed a set of notebooks using nnsight that mirror ands then extend the nnsight tutorials, and that works for the above 3 models.
Once we have finished our investigation, we intend publishing our training datasets, models, notebooks and code library as "worked examples".

We'd appreciate your help resolving this issue with v0.4.0 so our notebooks work with the latest nnsight code.

@JadenFiotto-Kaufman
Copy link
Member

Hey @PhilipQuirke Sorry to hear that! Can you give me more of a full example? This short one seems to work for me:

from nnsight import LanguageModel

model = LanguageModel("roneneldan/TinyStories-Instruct-2Layers-33M", device_map="auto", dispatch=True)

with model.generate("hello", max_new_tokens=10, pad_token_id=model.tokenizer.eos_token_id) as tracer:
	final_output = model.generator.output.save()

@PhilipQuirke
Copy link
Author

Cant attach an ipynb. Below is the body of my test ipynb that demonstrates the issue. Does this help?

!pip install -U nnsight -q          # Fails
#!pip install nnsight==0.3.7 -q     # Works
import nnsight

! pip install transformers -q
from transformers import AutoTokenizer, AutoModelForCausalLM

import torch

model_location = "roneneldan/TinyStories-Instruct-2Layers-33M"

tokenizer = AutoTokenizer.from_pretrained( model_location )

# model without flash attention
auto_model = AutoModelForCausalLM.from_pretrained(
    model_location,
    torch_dtype=torch.float32,
    device_map="auto",
)

tokenizer.padding_side = "left"
tokenizer.add_special_tokens({'pad_token': '<|pad|>'})

auto_model.resize_token_embeddings(len(tokenizer), mean_resizing=False)
auto_model.config.pad_token_id = tokenizer.pad_token_id
auto_model.resize_token_embeddings(len(tokenizer))

model = nnsight.LanguageModel(auto_model, tokenizer)
model.tokenizer = tokenizer

the_prompt = "Instructions: get distance from locations Context: CREATE TABLE locations ( distance INT, size INT) Response:"

inputs = model.tokenizer(the_prompt, return_tensors="pt", padding=True)
with model.generate(inputs['input_ids'], max_new_tokens=10, pad_token_id=model.tokenizer.eos_token_id) as tracer:
    final_output = model.generator.output.save()

@JadenFiotto-Kaufman
Copy link
Member

@PhilipQuirke Just pushed a new release. Can you try now?

@PhilipQuirke
Copy link
Author

That's certainly a big improvement with the TinyStories model loading successfully. Thanks very much!

Further testing, revealed another change. The commented line now fails in 0.4.0 but only for the TinyStories model (model_num == 1):

    N_LAYERS = len(model.transformer.h) if model_num == 1 else len(model.model.layers)
    N_HEADS = model.config.num_attention_heads # Works in 0.3.7, fails in 0.4.0
    D_MODEL = model.transformer.wte.embedding_dim if model_num == 1 else model.config.hidden_size
    D_HEAD = D_MODEL // N_HEADS  

Do you want a separate issue for this?
Alternatively happy to change my code. Is there a better / more consistent way to calculate the above 4 values across models?

@JadenFiotto-Kaufman
Copy link
Member

@PhilipQuirke Ah I see the problem. For now you can use model._model.config isntead of model.config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants