One shot user flow tweaks #2195

robertgshaw2-redhat · 2024-03-23T16:43:24Z

Enables:

applying one shot to BERT (previously would fail b/c labels are not allowed to be passed to BERT)
calling export on a model rather than on a file path (previously would fail during helper_functions.create_data_loader b/c sequence_length is calculated during helper_functions.create_model, which does not run if a model is passed to export

Also, there seems to be a stray import from sparsezoo which is failing

Note: need to check whether the labels change impacts the LLMs (though I don't think the labels are used here)

robertgshaw2-redhat · 2024-03-23T16:48:21Z

Recipe:

test_stage:
  obcq_modifiers:
    QuantizationModifier:
      ignore:
        - Tanh
        - GELUActivation
      post_oneshot_calibration: true
      scheme_overrides:
        # For the embeddings, only weight-quantization makes sense
        Embedding:
          input_activations: null
          weights:
            num_bits: 8
            symmetric: false
    SparseGPTModifier:
      sparsity: 0.0
      quantize: true
      targets: ["re:encoder.layer.\\d+$"]

Script

from sparseml.transformers import oneshot
from sparseml import export

from transformers import AutoModel, AutoTokenizer
from datasets import load_dataset

import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--recipe-path", type=str)

MODEL_ID = "sentence-transformers/all-MiniLM-L6-v2"
DATASET_ID = "neural-bridge/rag-dataset-12000"
NUM_SAMPLES = 512

if __name__ == "__main__":
    args = parser.parse_args()

    model = AutoModel.from_pretrained(MODEL_ID)
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

    ds = load_dataset(DATASET_ID, split="train")
    ds = ds.shuffle(seed=42).select(range(NUM_SAMPLES))
    ds = ds.rename_column("context", "text")

    oneshot(
        model=model,
        tokenizer=tokenizer,
        dataset=ds,
        recipe=args.recipe_path,
        output_dir="./minilm-int8-pt",
        max_seq_length=model.config.max_position_embeddings,
        concatenate_data=False,
    )

    export(model=model, integration="transformers", task="mlm", target_path="example_export")

To run:

python3 example.py --recipe-path recipe.yaml

mgoin · 2024-03-25T15:51:52Z

src/sparseml/transformers/integration_helper_functions.py

-                "Provide it manually using sequence_length argument"
-            )
+            if hasattr(model.config, "max_position_embeddings"):
+                sequence_length = model.config.max_position_embeddings


Add a _LOGGER.info here that you are using the default and what the value is

mgoin · 2024-03-25T16:17:53Z

src/sparseml/transformers/finetune/runner.py

+        tokenized_dataset = self.get_dataset_split("calibration")
+        if "labels" in tokenized_dataset.column_names:
+            tokenized_dataset = tokenized_dataset.remove_columns("labels")


what is the issue here? if this is relatively safe to keep, then leave a comment on why this specific column needs to be removed

mgoin · 2024-03-25T16:18:13Z

src/sparseml/transformers/finetune/data/custom.py

@@ -20,7 +20,7 @@
 from sparseml.transformers.utils.preprocessing_functions import (
    PreprocessingFunctionRegistry,
 )
-from sparsezoo.utils.helpers import import_from_path
+# from sparsezoo.utils.helpers import import_from_path


what is the issue here?

If this was giving you issue might be a sparsezoo version issue

robertgshaw2-redhat · 2024-03-26T16:26:12Z

Sorry - I didn't mean to put a PR on this ... I was just trying to hack something together. Will close

robertgshaw2-redhat added 2 commits March 23, 2024 16:34

enable applying one shot to minilm (bert) model

48d617f

cleanup

3196bea

robertgshaw2-redhat requested review from mgoin, bfineran and Satrat March 23, 2024 16:43

Update integration_helper_functions.py

95888f6

mgoin reviewed Mar 25, 2024

View reviewed changes

robertgshaw2-redhat closed this Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One shot user flow tweaks #2195

One shot user flow tweaks #2195

robertgshaw2-redhat commented Mar 23, 2024 •

edited

Loading

robertgshaw2-redhat commented Mar 23, 2024

mgoin Mar 25, 2024

mgoin Mar 25, 2024

mgoin Mar 25, 2024

Satrat Mar 26, 2024

robertgshaw2-redhat commented Mar 26, 2024

One shot user flow tweaks #2195

One shot user flow tweaks #2195

Conversation

robertgshaw2-redhat commented Mar 23, 2024 • edited Loading

robertgshaw2-redhat commented Mar 23, 2024

mgoin Mar 25, 2024

Choose a reason for hiding this comment

mgoin Mar 25, 2024

Choose a reason for hiding this comment

mgoin Mar 25, 2024

Choose a reason for hiding this comment

Satrat Mar 26, 2024

Choose a reason for hiding this comment

robertgshaw2-redhat commented Mar 26, 2024

robertgshaw2-redhat commented Mar 23, 2024 •

edited

Loading