Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using Functionary-small-v3.2 AWQ version with vLLM #259

Open
MadanMaram opened this issue Aug 27, 2024 · 5 comments
Open

Error using Functionary-small-v3.2 AWQ version with vLLM #259

MadanMaram opened this issue Aug 27, 2024 · 5 comments

Comments

@MadanMaram
Copy link

MadanMaram commented Aug 27, 2024

Hello Functionary team,

I'm trying to use the Functionary-small-v3.2 AWQ version with vLLM for inference, but I'm encountering an error. The vLLM library doesn't seem to recognize the 'FunctionaryForCausalLM' architecture.

Here's the specific error I'm getting:
ValueError: Model architectures ['FunctionaryForCausalLM'] are not supported for now.
I'm able to run the non-AWQ version successfully, but I'd like to use the AWQ version. Could you please provide guidance on:

  1. Is the Functionary-small-v3.2 AWQ version compatible with vLLM?
  2. Are there any special steps or configurations needed to use the AWQ version with vLLM?
  3. If vLLM doesn't support this architecture, do you have any recommendations for alternative with the AWQ version of Functionary-small-v3.2?

Any information or resources you can provide would be greatly appreciated. Thank you for your help!

@jeffreymeetkai
Copy link
Collaborator

Hi, we do not have a functionary-small-v3.2 AWQ model currently. To help to reproduce, may I know where did you get this model from?

@MadanMaram
Copy link
Author

MadanMaram commented Aug 27, 2024

Thank you for your response. I apologize for the confusion. I should have been clearer in my initial message. I don't have an official AWQ version of functionary-small-v3.2. Instead, I have quantized the model myself using the AWQ method. Here's the process I followed:
I used the AWQ library to quantize the functionary-small-v3.2 model.

Here's the code I used for quantization:
pythonCopyfrom awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import tqdm as notebook_tqdm

model_path = 'meetkai/functionary-small-v3.2'
quant_path = 'meetkai/functionary-small-v3.2-awq'
quant_config = {
"zero_point": True,
"q_group_size": 128,
"w_bit": 4,
"version": "GEMM"
}

#Load model
model = AutoAWQForCausalLM.from_pretrained(
model_path,
**{"low_cpu_mem_usage": True},
device_map='cuda'
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

After quantizing the model using this method, I attempted to use it with vLLM, which is when I encountered the error about the 'FunctionaryForCausalLM' architecture not being supported.

I appreciate any guidance you can provide on this matter.

@MadanMaram
Copy link
Author

Are there any plans to release an official AWQ version of functionary-small-v3.2 in the future?
If so, do you have an estimated timeline for when this might be available?

@QwertyJack
Copy link

Based on past experience, quantized versions like AWQ conserve significant RAM with minimal loss, though I'm unsure if this applies to functionary models.

@QwertyJack
Copy link

I deployed a self-quantized GPTQ version of Functionary-small-v3.2, and it works perfectly for me.
I developed the quantized model based on AutoGPTQ documentation, using wikitext2 as the calibration dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants