-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match GPTQ state dict #2188
Match GPTQ state dict #2188
Conversation
d442b66
to
29f83bb
Compare
8227bfe
to
1b1567d
Compare
return wrapper | ||
|
||
|
||
def is_quantization_target(key: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should move file to gptq_helpers.py
or make sure these functions are named with gptq specifically since these assumptions are specific to how this algorithm is applied, not all quantization
def _log_call(func): | ||
@functools.wraps(func) | ||
def wrapper(*args, **kwargs): | ||
_LOGGER.info("Applying transformation: %s", func.__name__.upper()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move to debug, users won't necessarily know the internal transformation names
intweight = [] | ||
infeatures = weight.shape[1] | ||
for idx in range(infeatures): | ||
intweight.append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after we fix the accuracy issue - let's see what we can do to speed this up - or at least time it. with grouping vectorizing might be tricky but could at least pre-allocate the final tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could maybe try moving the model to GPU before running the transformations (ie model.to("cuda:0")
)
- Reshape the zero points tensor to [1, x] of type int32 and fill with zeros | ||
(it is assumed that quantization was symmetric) | ||
|
||
:param state_dict: The state_dict to be transformed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify that keys should already have been updated
remove src. from imports
Update names Some Cleanup
Add docstring to QuantizationConfig
3082f63
to
8257040
Compare
Closing as this is not needed now! |
Conversion script:
config.json
Usage Script: (needs vLLM)