From c07bef321b9e9751c39be71ba931037e6990e6e9 Mon Sep 17 00:00:00 2001
From: Helena <helena.kloosterman@intel.com>
Date: Sun, 12 Mar 2023 21:13:07 +0100
Subject: [PATCH] Add JPQD evaluation notebook

---
 ...question_answering_quantization_jpqd.ipynb | 892 ++++++++++++++++++
 1 file changed, 892 insertions(+)
 create mode 100644 notebooks/openvino/question_answering_quantization_jpqd.ipynb

diff --git a/notebooks/openvino/question_answering_quantization_jpqd.ipynb b/notebooks/openvino/question_answering_quantization_jpqd.ipynb
new file mode 100644
index 0000000000..246a29583b
--- /dev/null
+++ b/notebooks/openvino/question_answering_quantization_jpqd.ipynb
@@ -0,0 +1,892 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "eaed3927-e315-46d3-8889-df3f3bbcbf6b",
+   "metadata": {},
+   "source": [
+    "# Joint Pruning Quantization and Distillation with OpenVINO and NNCF\n",
+    "\n",
+    "With quantization, we reduce the precision of the model's weights and activations from floating point (FP32) to integer (INT8). This results in a smaller model with faster inference times with OpenVINO Runtime. \n",
+    "\n",
+    "Please see the [Optimum OpenVINO model compression documentation](https://huggingface.co/docs/optimum/intel/optimization_ov#optimizationhttps://huggingface.co/docs/optimum/intel/optimization_ov#optimization) for more information about compressing models with NNCF and JPQD.\n",
+    "\n",
+    "JPQD is applied during training/finetuning of the model. It's not ideal to train models for a long time in a notebook and we recommend to run the [question-answering example](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino/question-answering) in a terminal to quantize the model yourself. \n",
+    "\n",
+    "To follow this notebook, you do not need to compress the model yourself, you can use the already compressed model that we uploaded to the Hugging Face hub.\n",
+    "\n",
+    "A laptop or desktop with a recent Intel Core processor is recommended for best results. To install the requirements for this notebook, please do `pip install \"optimum[openvino]\" \"evaluate[evaluator]\" ipywidgets datasets` or uncomment the cell below to install the requirements in your current Python environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "3d4e47b2-89cb-4ffa-84f3-11919fa367e6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# %pip install \"optimum-intel[openvino]\" \"evaluate[evaluator]\" ipywidgets datasets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "0407fc92-c052-47b7-8721-01836adf3b54",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/helena/venvs/openvino_env/lib/python3.10/site-packages/openvino/offline_transformations/__init__.py:10: FutureWarning: The module is private and following namespace `offline_transformations` will be removed in the future.\n",
+      "  warnings.warn(\n"
+     ]
+    }
+   ],
+   "source": [
+    "import random\n",
+    "import tempfile\n",
+    "from pathlib import Path\n",
+    "\n",
+    "import datasets\n",
+    "import evaluate\n",
+    "import pandas as pd\n",
+    "import transformers\n",
+    "from evaluate import evaluator\n",
+    "from optimum.intel.openvino import OVModelForQuestionAnswering\n",
+    "from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\n",
+    "\n",
+    "from openvino.runtime import Core\n",
+    "\n",
+    "transformers.logging.set_verbosity_error()\n",
+    "datasets.logging.set_verbosity_error()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "889a16fe-2bc0-477e-b8d6-02a4f7508f03",
+   "metadata": {},
+   "source": [
+    "## Settings\n",
+    "\n",
+    "We will compare the accuracy and performance of the quantized and pruned model with that of an FP32 bert-base-uncased model which was also finetuned on the SQuAD dataset, following the [Transformers question-answering example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering#fine-tuning-bert-on-squad10). \n",
+    "\n",
+    "We give the model_ids for the FP32 model and the INT8 model and define the dataset name. If you trained the models yourself, set FP32_MODEL_ID and INT8_MODEL_ID to the directory containing the model and tokenizer files.\n",
+    "\n",
+    "The models were finetuned on the [Stanford Question Answering Dataset (SQuAD)](https://huggingface.co/datasets/squad), a reading comprehension dataset consisting of questions on a set of Wikipedia articles, where the answer to every question is a segment of text from a given context. The models were finetuned on version 1 of the SQuAD dataset, so VERSION_2_WITH_NEGATIVE should be set to False. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "868463a5-6e09-46a1-832a-91b823ca2a4a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "FP32_MODEL_ID = \"helenai/bert-base-uncased-squad-v1\"\n",
+    "INT8_MODEL_ID = \"helenai/bert-base-uncased-squad-v1-jpqd-ov-int8\"\n",
+    "DATASET_NAME = \"squad\"\n",
+    "VERSION_2_WITH_NEGATIVE = False"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a0551ac-eee2-4f2d-b89f-bb3253d2c107",
+   "metadata": {},
+   "source": [
+    "#### Intel GPU support\n",
+    "\n",
+    "At the moment, quantized embeddings are not supported for inference on GPU. To show inference on iGPU, we compressed the model without quantizing embeddings, by adding ` \"{re}.*Embeddings.*\"` to the `ignored_scopes` in the quantization sections of the [NNCF config](https://github.com/huggingface/optimum-intel/blob/main/examples/openvino/question-answering/configs/bert-base-jpqd.json) and compressed the model again with that config. This does not affect performance, but it does affect file size of the quantized model, from 75 to 146 MB.\n",
+    "\n",
+    "The code in the cell below checks if a GPU is available for OpenVINO inference, and if so it sets INT8_MODEL_ID to the GPU-enabled version of the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "30fda4c5-3f54-49fe-be82-463a83cc882a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "gpu_available = \"GPU\" in Core().available_devices\n",
+    "if gpu_available:\n",
+    "    INT8_MODEL_ID = \"helenai/bert-base-uncased-squad-v1-jpqd-ov-int8@gpu\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "124bd9ad-077c-4f41-b579-0bf978fe6a1e",
+   "metadata": {},
+   "source": [
+    "## Load the Dataset\n",
+    "\n",
+    "The `datasets` library makes it easy to load datasets. Common datasets can be loaded from the Hugging Face Hub by providing the name of the dataset. See https://github.com/huggingface/datasets. We load the SQuAD dataset with `load_dataset`, show a random dataset item, and the list of categories in the dataset.\n",
+    "\n",
+    "Every dataset item in the SQuAD dataset has a unique id, a title which denotes the category, a context and a question, and answers. The answer is a subset of the context, and both the text of the answer, and the start position of the answer in the context (`answer_start`) are returned.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "602fe46f-c96a-4a0f-9338-58339d466f3a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'id': '56e77a8700c9c71400d7718b',\n",
+       " 'title': 'Teacher',\n",
+       " 'context': \"In the past, teachers have been paid relatively low salaries. However, average teacher salaries have improved rapidly in recent years. US teachers are generally paid on graduated scales, with income depending on experience. Teachers with more experience and higher education earn more than those with a standard bachelor's degree and certificate. Salaries vary greatly depending on state, relative cost of living, and grade taught. Salaries also vary within states where wealthy suburban school districts generally have higher salary schedules than other districts. The median salary for all primary and secondary teachers was $46,000 in 2004, with the average entry salary for a teacher with a bachelor's degree being an estimated $32,000. Median salaries for preschool teachers, however, were less than half the national median for secondary teachers, clock in at an estimated $21,000 in 2004. For high school teachers, median salaries in 2007 ranged from $35,000 in South Dakota to $71,000 in New York, with a national median of $52,000. Some contracts may include long-term disability insurance, life insurance, emergency/personal leave and investment options. The American Federation of Teachers' teacher salary survey for the 2006-07 school year found that the average teacher salary was $51,009. In a salary survey report for K-12 teachers, elementary school teachers had the lowest median salary earning $39,259. High school teachers had the highest median salary earning $41,855. Many teachers take advantage of the opportunity to increase their income by supervising after-school programs and other extracurricular activities. In addition to monetary compensation, public school teachers may also enjoy greater benefits (like health insurance) compared to other occupations. Merit pay systems are on the rise for teachers, paying teachers extra money based on excellent classroom evaluations, high test scores and for high success at their overall school. Also, with the advent of the internet, many teachers are now selling their lesson plans to other teachers through the web in order to earn supplemental income, most notably on TeachersPayTeachers.com.\",\n",
+       " 'question': 'What has been getting much better in the most recent years?',\n",
+       " 'answers': {'text': ['average teacher salaries',\n",
+       "   'average teacher salaries',\n",
+       "   'teacher salaries'],\n",
+       "  'answer_start': [71, 71, 79]}}"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "examples = datasets.load_dataset(DATASET_NAME, split=\"validation\")\n",
+    "random.choice(examples)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "d86d98b4-d3d6-4fb5-9b3e-53d61813e52a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'Scottish_Parliament', 'Oxygen', 'United_Methodist_Church', 'European_Union_law', 'Construction', 'French_and_Indian_War', 'Martin_Luther', 'Super_Bowl_50', 'Genghis_Khan', 'Prime_number', 'Rhine', 'Steam_engine', 'Economic_inequality', 'Yuan_dynasty', '1973_oil_crisis', 'American_Broadcasting_Company', 'Computational_complexity_theory', 'Packet_switching', 'Civil_disobedience', 'Warsaw', 'Teacher', 'Southern_California', 'Normans', 'Newcastle_upon_Tyne', 'Black_Death', 'Chloroplast', 'Jacksonville,_Florida', 'Imperialism', 'Apollo_program', 'Huguenot', 'Pharmacy', 'Ctenophora', 'Victoria_and_Albert_Museum', 'Kenya', 'Immune_system', 'Intergovernmental_Panel_on_Climate_Change', 'Doctor_Who', 'Force', 'University_of_Chicago', 'Amazon_rainforest', 'Fresno,_California', 'Geology', 'Islamism', 'Victoria_(Australia)', 'Private_school', 'Nikola_Tesla', 'Sky_(United_Kingdom)', 'Harvard_University'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(set([item[\"title\"] for item in examples]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37d8bab7-6eed-4a75-9ee5-330586450453",
+   "metadata": {},
+   "source": [
+    "## Load Model and Tokenizer\n",
+    "\n",
+    "We load the PyTorch FP32 model and the OpenVINO INT8 model from the Hugging Face Hub. The models will be automatically downloaded if it has not been downloaded before, or loaded from the cache otherwise. To load the quantized model with OpenVINO, we use the `OVModelForQuestionAnswering` class. It can be used in the same way as [`AutoModelForQuestionAnswering`](https://huggingface.co/docs/transformers/main/model_doc/auto).\n",
+    "\n",
+    "\n",
+    "We also load the tokenizer, which converts the questions and contexts from the dataset to tokens, converting the inputs in a format the model expects."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "38641b14-07d0-49d5-af86-8b5247ae39d8",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'input_ids': [101, 7592, 2088, 999, 102], 'token_type_ids': [0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1]}"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "fp32_model = AutoModelForQuestionAnswering.from_pretrained(FP32_MODEL_ID)\n",
+    "int8_model = OVModelForQuestionAnswering.from_pretrained(INT8_MODEL_ID)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(FP32_MODEL_ID)\n",
+    "\n",
+    "# See how the tokenizer for the given model converts input text to model input values\n",
+    "tokenizer(\"hello world!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2574cc63-aad3-4c28-aa6f-e553de911ce5",
+   "metadata": {},
+   "source": [
+    "## Compare INT8 and FP32 models\n",
+    "\n",
+    "We compare the accuracy, model size and inference results and latency of the FP32 and INT8 models.\n",
+    "### Inference Pipeline\n",
+    "\n",
+    "Transformers [Pipelines](https://huggingface.co/docs/transformers/main/en/pipeline_tutorial) simplify model inference. A `Pipeline` is created by adding a task, model and tokenizer to the `pipeline` function. Inference is then as simple as `qa_pipeline({\"question\": question, \"context\": context})`.\n",
+    "\n",
+    "We create two pipelines: `hf_qa_pipeline` and `ov_qa_pipeline` to compare the FP32 PyTorch model with the OpenVINO INT8 model. These pipelines will also be used for showing the accuracy difference and for benchmarking later in this notebook.\n",
+    "\n",
+    "For some Intel processors, it can be beneficial to reshape the OpenVINO model to a static shape of (1,384) for faster inference. This requires padding or truncating inputs to the specified sequence length. This can be done by adding `padding`, `max_seq_len` and `truncation` arguments to the `pipeline` function. See Hugging Face's [padding and truncation documentation](https://huggingface.co/docs/transformers/pad_truncation) for more information on the possible values.\n",
+    "\n",
+    "Setting a shorter sequence length in the cell below will speed up inference further, with the possibility of a drop in accuracy, since larger model inputs will be truncated."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "e02e40dd-b208-42b8-9413-6dac61b75476",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "USE_DYNAMIC_SHAPES = False\n",
+    "\n",
+    "if USE_DYNAMIC_SHAPES:\n",
+    "    ov_qa_pipeline = pipeline(\"question-answering\", model=int8_model, tokenizer=tokenizer)\n",
+    "else:\n",
+    "    seq_length = 384\n",
+    "    int8_model.reshape(1, seq_length)\n",
+    "    int8_model.compile()\n",
+    "    ov_qa_pipeline = pipeline(\n",
+    "        \"question-answering\", model=int8_model, tokenizer=tokenizer, max_seq_len=seq_length, padding=\"max_length\", truncation=True\n",
+    "    )\n",
+    "\n",
+    "hf_qa_pipeline = pipeline(\"question-answering\", model=fp32_model, tokenizer=tokenizer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8132b4c8-7c06-4da4-a33a-d2e235a97fd9",
+   "metadata": {},
+   "source": [
+    "Show a dataset item and inference results on both pipelines."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "2e23fe96-8d7f-4aa1-816f-707ca1a2f978",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50.\n"
+     ]
+    }
+   ],
+   "source": [
+    "context = examples[0][\"context\"]\n",
+    "question = \"Who won the game?\"\n",
+    "print(context)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "c1168f1c-14de-4aad-977d-122a8d366935",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers'"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "hf_qa_pipeline({\"question\": question, \"context\": context})[\"answer\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "c885d378-2842-49d0-b583-a2fc023558b5",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Denver Broncos'"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "ov_qa_pipeline({\"question\": question, \"context\": context})[\"answer\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97a52092-e352-47ef-9ed2-89508bc48d70",
+   "metadata": {},
+   "source": [
+    "### Accuracy\n",
+    "\n",
+    "We load the quantized model and the original FP32 model, and compare the metrics on both models. The [evaluate](https://github.com/huggingface/evaluate) library makes it very easy to evaluate models on a given dataset, with a given metric. For the SQuAD dataset, the F1 score and Exact Match metrics are returned.\n",
+    "\n",
+    "The SQuAD dataset is pretty large and it can take some time to run the evaluation on the full dataset. For demonstration purposes, we evaluate the metrics on a subset of 500 items of the dataset. The metrics on the full validation dataset are:\n",
+    "\n",
+    "```\n",
+    "FP32 exact match 81.5, F1 88.7\n",
+    "INT8 exact match 82.5, F1 89.5\n",
+    "```\n",
+    "\n",
+    "The evaluate function also keeps track of the time it takes to run. This provides an estimate of performance, but keep in mind that other programs running on the computer (including Jupyter), as well as power management settings, can affect performance.\n",
+    "\n",
+    "If you have a processor with an Intel integrated GPU, or a dedicated Intel GPU, you can run inference on the GPU for even faster performance. An 11th generation Intel Core processor or later with Xe graphics, is recommended for iGPU inference. See [OpenVINO documentation](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html) about installing GPU drivers if you are on Linux or macOS (on Windows iGPU inference should work out of the box).\n",
+    "\n",
+    "Currently, dynamic shapes are supported with limitations on GPU. In the code below we enable GPU inference if a GPU is available to OpenVINO and if the model is compiled with static shapes, in the previous section. Note that minor variations in accuracy between CPU and GPU are expected."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "bae78873-feed-408b-9d48-f4008cb5ca61",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "random.seed(2023)\n",
+    "num_items = 500\n",
+    "# Set num_items to len(examples) to validate on the entire dataset. That may take a long time!\n",
+    "# num_items = len(examples)\n",
+    "indices = sorted(random.sample(range(len(examples)), k=num_items))\n",
+    "filtered_examples = examples.select(indices)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "f387b276-5b6b-43f0-924e-80f80ae453d2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "squad_eval = evaluator(\"question-answering\")\n",
+    "\n",
+    "hf_eval_results = squad_eval.compute(\n",
+    "    model_or_pipeline=hf_qa_pipeline,\n",
+    "    data=filtered_examples,\n",
+    "    metric=\"squad\",\n",
+    "    squad_v2_format=VERSION_2_WITH_NEGATIVE,\n",
+    ")\n",
+    "\n",
+    "devices = (\"CPU\", \"GPU\") if (\"GPU\" in Core().available_devices and not int8_model.is_dynamic) else (\"CPU\",)\n",
+    "ov_eval_results = {}\n",
+    "for device in devices:\n",
+    "    int8_model.to(device)\n",
+    "    int8_model.compile()\n",
+    "\n",
+    "    # run a few warmup inferences\n",
+    "    for item in examples.select(range(10)):\n",
+    "        ov_qa_pipeline(item[\"question\"], item[\"context\"])\n",
+    "\n",
+    "    ov_eval_results[device] = squad_eval.compute(\n",
+    "        model_or_pipeline=ov_qa_pipeline,\n",
+    "        data=filtered_examples,\n",
+    "        metric=\"squad\",\n",
+    "        squad_v2_format=VERSION_2_WITH_NEGATIVE,\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "d1a71eea-bc62-466b-8545-33b7253ee2c9",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>exact_match</th>\n",
+       "      <th>f1</th>\n",
+       "      <th>latency</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>FP32</th>\n",
+       "      <td>80.8</td>\n",
+       "      <td>88.4116</td>\n",
+       "      <td>143.7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>INT8 CPU</th>\n",
+       "      <td>82.0</td>\n",
+       "      <td>88.7953</td>\n",
+       "      <td>64.1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>INT8 GPU</th>\n",
+       "      <td>82.8</td>\n",
+       "      <td>89.3397</td>\n",
+       "      <td>34.1</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "          exact_match       f1  latency\n",
+       "FP32             80.8  88.4116    143.7\n",
+       "INT8 CPU         82.0  88.7953     64.1\n",
+       "INT8 GPU         82.8  89.3397     34.1"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "summary = (\n",
+    "    pd.DataFrame.from_records(\n",
+    "        [hf_eval_results, *ov_eval_results.values()],\n",
+    "        columns=[\"exact_match\", \"f1\", \"latency_in_seconds\"],\n",
+    "        index=[\"FP32\", *(f\"INT8 {device}\" for device in devices)],\n",
+    "    )\n",
+    "    .round(4)\n",
+    "    .dropna()\n",
+    ")\n",
+    "summary[\"latency_in_seconds\"] *= 1000\n",
+    "summary.columns = [\"exact_match\", \"f1\", \"latency\"]\n",
+    "summary"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "3b435f18-5233-4e54-bc98-ad85d468f041",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INT8 speedup on CPU: 2.24X\n",
+      "INT8 speedup on GPU: 4.21X\n",
+      "11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz\n"
+     ]
+    }
+   ],
+   "source": [
+    "for device in devices:\n",
+    "    int8_speedup = summary.loc[\"FP32\"][\"latency\"] / summary.loc[f\"INT8 {device}\"][\"latency\"]\n",
+    "    print(f\"INT8 speedup on {device}: {int8_speedup:.2f}X\")\n",
+    "print(Core().get_property(\"CPU\", \"FULL_DEVICE_NAME\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db183795-6dae-4ef6-847d-042223264149",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-11-07T21:25:39.912874Z",
+     "iopub.status.busy": "2022-11-07T21:25:39.912662Z",
+     "iopub.status.idle": "2022-11-07T21:25:39.916029Z",
+     "shell.execute_reply": "2022-11-07T21:25:39.915541Z",
+     "shell.execute_reply.started": "2022-11-07T21:25:39.912859Z"
+    }
+   },
+   "source": [
+    "### Inference Results\n",
+    "\n",
+    "To fully understand the quality of a model, it is useful to look beyond metrics like Exact Match and F1 score and examine model predictions directly. This can give a more complete impression of the model's performance and help identify areas for improvement.\n",
+    "\n",
+    "In the next cell, we go over a selection of items in the filtered validation set, and display the items where the FP32 prediction score is different from the INT8 prediction score\n",
+    "\n",
+    "The table displays the question and the set of correct answers from the dataset, the FP32 prediction and F1 score and the INT8 prediction and F1 score. The results show that for some predictions, the FP32 model is better, and for others, the INT8 model is, and that for the large majority of dataset items both models are equally accurate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "ab953c89-ed9d-4afa-8953-541c982174ff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "results = []\n",
+    "int8_better = 0\n",
+    "num_items = 100\n",
+    "metric = evaluate.load(\"squad_v2\" if VERSION_2_WITH_NEGATIVE else \"squad\")\n",
+    "\n",
+    "for item in filtered_examples.select(range(num_items)):\n",
+    "    id, title, context, question, answers = item.values()\n",
+    "    fp32_answer = hf_qa_pipeline(question, context)[\"answer\"]\n",
+    "    int8_answer = ov_qa_pipeline(question, context)[\"answer\"]\n",
+    "\n",
+    "    references = [{\"id\": id, \"answers\": answers}]\n",
+    "    fp32_predictions = [{\"id\": id, \"prediction_text\": fp32_answer}]\n",
+    "    int8_predictions = [{\"id\": id, \"prediction_text\": int8_answer}]\n",
+    "\n",
+    "    fp32_score = round(metric.compute(references=references, predictions=fp32_predictions)[\"f1\"], 2)\n",
+    "    int8_score = round(metric.compute(references=references, predictions=int8_predictions)[\"f1\"], 2)\n",
+    "\n",
+    "    if int8_score != fp32_score:\n",
+    "        results.append((question, answers[\"text\"], fp32_answer, fp32_score, int8_answer, int8_score))\n",
+    "        if int8_score > fp32_score:\n",
+    "            int8_better += 1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "37b78ee3-c330-4ef8-8528-47d5a8b73424",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Question</th>\n",
+       "      <th>Answer</th>\n",
+       "      <th>FP32 prediction</th>\n",
+       "      <th>FP32 F1</th>\n",
+       "      <th>INT8 prediction</th>\n",
+       "      <th>INT8 F1</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Who was the male singer who performed as a special guest during Super Bowl 50?</td>\n",
+       "      <td>[Bruno Mars, Bruno Mars, Bruno Mars,]</td>\n",
+       "      <td>Beyoncé and Bruno Mars</td>\n",
+       "      <td>66.67</td>\n",
+       "      <td>Bruno Mars</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>What position does Demaryius Thomas play?</td>\n",
+       "      <td>[receiver, receiver, Thomas]</td>\n",
+       "      <td>Veteran receiver</td>\n",
+       "      <td>66.67</td>\n",
+       "      <td>receiver</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>Which smartphone customers were the only people who could stream the game on their phones?</td>\n",
+       "      <td>[Verizon Wireless customers, Verizon, Verizon]</td>\n",
+       "      <td>Verizon Wireless</td>\n",
+       "      <td>80.00</td>\n",
+       "      <td>Verizon</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>Who stripped the ball from Cam Newton while sacking him on this drive?</td>\n",
+       "      <td>[Von Miller, Von Miller, Miller]</td>\n",
+       "      <td>Von Miller</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>linebacker Von Miller</td>\n",
+       "      <td>80.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>What were Tesla's mother's special abilities?</td>\n",
+       "      <td>[making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems, making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems, making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems]</td>\n",
+       "      <td>memorize Serbian epic poems</td>\n",
+       "      <td>47.06</td>\n",
+       "      <td>craft tools, mechanical appliances, and the ability to memorize Serbian epic poems</td>\n",
+       "      <td>91.67</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>What was Tesla's AC system used for in Pittsburgh?</td>\n",
+       "      <td>[to power the city's streetcars., the city's streetcars, street cars]</td>\n",
+       "      <td>create an alternating current system to power the city's streetcars</td>\n",
+       "      <td>66.67</td>\n",
+       "      <td>helping to create an alternating current system to power the city's streetcars</td>\n",
+       "      <td>57.14</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>Where can Tesla's theories as to what caused the skin damage be found?</td>\n",
+       "      <td>[In his many notes, In his many notes]</td>\n",
+       "      <td>Roentgen rays</td>\n",
+       "      <td>0.00</td>\n",
+       "      <td>ozone generated in contact with the skin</td>\n",
+       "      <td>20.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>How far did he claim the mechanical energy could be transmitted?</td>\n",
+       "      <td>[over any terrestrial distance, any terrestrial distance, any terrestrial distance]</td>\n",
+       "      <td>over any terrestrial distance</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>terrestrial distance</td>\n",
+       "      <td>80.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>What was the occasion when he claimed he'd made the death ray?</td>\n",
+       "      <td>[at a luncheon in his honor, a luncheon in his honor, a luncheon in his honor]</td>\n",
+       "      <td>luncheon</td>\n",
+       "      <td>40.00</td>\n",
+       "      <td>1937, at a luncheon</td>\n",
+       "      <td>50.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>A non-deterministic Turing machine has the ability to capture what facet of useful analysis?</td>\n",
+       "      <td>[mathematical models, mathematical models, branching]</td>\n",
+       "      <td>mathematical models we want to analyze</td>\n",
+       "      <td>50.00</td>\n",
+       "      <td>mathematical models</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>What is the youngest student a teacher might have?</td>\n",
+       "      <td>[infants, infants, infants]</td>\n",
+       "      <td>infants to adults</td>\n",
+       "      <td>50.00</td>\n",
+       "      <td>infants</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>What's the biggest difference in the teaching relationship for primary and secondary school?</td>\n",
+       "      <td>[the relationship between teachers and children, the relationship between teachers and children., the relationship between teachers and children., the relationship between teachers and children]</td>\n",
+       "      <td>the relationship between teachers and children</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>teachers and children</td>\n",
+       "      <td>75.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>Where are teachers recruited from?</td>\n",
+       "      <td>[Lehramtstudien (Teaching Education Studies), Lehramtstudien, special university classes]</td>\n",
+       "      <td>special university classes</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>Germany</td>\n",
+       "      <td>0.00</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                                                                        Question  \\\n",
+       "0                 Who was the male singer who performed as a special guest during Super Bowl 50?   \n",
+       "1                                                      What position does Demaryius Thomas play?   \n",
+       "2     Which smartphone customers were the only people who could stream the game on their phones?   \n",
+       "3                         Who stripped the ball from Cam Newton while sacking him on this drive?   \n",
+       "4                                                  What were Tesla's mother's special abilities?   \n",
+       "5                                             What was Tesla's AC system used for in Pittsburgh?   \n",
+       "6                         Where can Tesla's theories as to what caused the skin damage be found?   \n",
+       "7                               How far did he claim the mechanical energy could be transmitted?   \n",
+       "8                                 What was the occasion when he claimed he'd made the death ray?   \n",
+       "9   A non-deterministic Turing machine has the ability to capture what facet of useful analysis?   \n",
+       "10                                            What is the youngest student a teacher might have?   \n",
+       "11  What's the biggest difference in the teaching relationship for primary and secondary school?   \n",
+       "12                                                            Where are teachers recruited from?   \n",
+       "\n",
+       "                                                                                                                                                                                                                                                                                              Answer  \\\n",
+       "0                                                                                                                                                                                                                                                              [Bruno Mars, Bruno Mars, Bruno Mars,]   \n",
+       "1                                                                                                                                                                                                                                                                       [receiver, receiver, Thomas]   \n",
+       "2                                                                                                                                                                                                                                                     [Verizon Wireless customers, Verizon, Verizon]   \n",
+       "3                                                                                                                                                                                                                                                                   [Von Miller, Von Miller, Miller]   \n",
+       "4   [making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems, making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems, making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems]   \n",
+       "5                                                                                                                                                                                                                              [to power the city's streetcars., the city's streetcars, street cars]   \n",
+       "6                                                                                                                                                                                                                                                             [In his many notes, In his many notes]   \n",
+       "7                                                                                                                                                                                                                [over any terrestrial distance, any terrestrial distance, any terrestrial distance]   \n",
+       "8                                                                                                                                                                                                                     [at a luncheon in his honor, a luncheon in his honor, a luncheon in his honor]   \n",
+       "9                                                                                                                                                                                                                                              [mathematical models, mathematical models, branching]   \n",
+       "10                                                                                                                                                                                                                                                                       [infants, infants, infants]   \n",
+       "11                                                                                                [the relationship between teachers and children, the relationship between teachers and children., the relationship between teachers and children., the relationship between teachers and children]   \n",
+       "12                                                                                                                                                                                                         [Lehramtstudien (Teaching Education Studies), Lehramtstudien, special university classes]   \n",
+       "\n",
+       "                                                        FP32 prediction  \\\n",
+       "0                                                Beyoncé and Bruno Mars   \n",
+       "1                                                      Veteran receiver   \n",
+       "2                                                      Verizon Wireless   \n",
+       "3                                                            Von Miller   \n",
+       "4                                           memorize Serbian epic poems   \n",
+       "5   create an alternating current system to power the city's streetcars   \n",
+       "6                                                         Roentgen rays   \n",
+       "7                                         over any terrestrial distance   \n",
+       "8                                                              luncheon   \n",
+       "9                                mathematical models we want to analyze   \n",
+       "10                                                    infants to adults   \n",
+       "11                       the relationship between teachers and children   \n",
+       "12                                           special university classes   \n",
+       "\n",
+       "    FP32 F1  \\\n",
+       "0     66.67   \n",
+       "1     66.67   \n",
+       "2     80.00   \n",
+       "3    100.00   \n",
+       "4     47.06   \n",
+       "5     66.67   \n",
+       "6      0.00   \n",
+       "7    100.00   \n",
+       "8     40.00   \n",
+       "9     50.00   \n",
+       "10    50.00   \n",
+       "11   100.00   \n",
+       "12   100.00   \n",
+       "\n",
+       "                                                                       INT8 prediction  \\\n",
+       "0                                                                           Bruno Mars   \n",
+       "1                                                                             receiver   \n",
+       "2                                                                              Verizon   \n",
+       "3                                                                linebacker Von Miller   \n",
+       "4   craft tools, mechanical appliances, and the ability to memorize Serbian epic poems   \n",
+       "5       helping to create an alternating current system to power the city's streetcars   \n",
+       "6                                             ozone generated in contact with the skin   \n",
+       "7                                                                 terrestrial distance   \n",
+       "8                                                                  1937, at a luncheon   \n",
+       "9                                                                  mathematical models   \n",
+       "10                                                                             infants   \n",
+       "11                                                               teachers and children   \n",
+       "12                                                                             Germany   \n",
+       "\n",
+       "    INT8 F1  \n",
+       "0    100.00  \n",
+       "1    100.00  \n",
+       "2    100.00  \n",
+       "3     80.00  \n",
+       "4     91.67  \n",
+       "5     57.14  \n",
+       "6     20.00  \n",
+       "7     80.00  \n",
+       "8     50.00  \n",
+       "9    100.00  \n",
+       "10   100.00  \n",
+       "11    75.00  \n",
+       "12     0.00  "
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pd.set_option(\"display.max_colwidth\", None)\n",
+    "df = pd.DataFrame(\n",
+    "    results,\n",
+    "    columns=[\"Question\", \"Answer\", \"FP32 prediction\", \"FP32 F1\", \"INT8 prediction\", \"INT8 F1\"],\n",
+    ")\n",
+    "df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58df445d-af43-4ba1-8195-7d8f00b8f82f",
+   "metadata": {},
+   "source": [
+    "### Model Size\n",
+    "\n",
+    "We save the FP32 and INT8 models to a temporary directory and define a function to show the model size for the PyTorch and OpenVINO models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "1eeaa81f-7fc5-49ba-80b8-2d95a1310a0c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "FP32 model size: 435.64 MB\n",
+      "INT8 model size: 147.57 MB\n",
+      "INT8 size decrease: 2.95x\n"
+     ]
+    }
+   ],
+   "source": [
+    "def get_model_size(model_folder, framework):\n",
+    "    \"\"\"\n",
+    "    Return OpenVINO or PyTorch model size in Mb.\n",
+    "    Arguments:\n",
+    "        model_folder:\n",
+    "            Directory containing a pytorch_model.bin for a PyTorch model, and an openvino_model.xml/.bin for an OpenVINO model.\n",
+    "        framework:\n",
+    "            Define whether the model is a PyTorch or an OpenVINO model.\n",
+    "    \"\"\"\n",
+    "    if framework.lower() == \"openvino\":\n",
+    "        model_path = Path(model_folder) / \"openvino_model.xml\"\n",
+    "        model_size = model_path.stat().st_size + model_path.with_suffix(\".bin\").stat().st_size\n",
+    "    elif framework.lower() == \"pytorch\":\n",
+    "        model_path = Path(model_folder) / \"pytorch_model.bin\"\n",
+    "        model_size = model_path.stat().st_size\n",
+    "    model_size /= 1000 * 1000\n",
+    "    return model_size\n",
+    "\n",
+    "\n",
+    "with tempfile.TemporaryDirectory() as fp32_model_dir:\n",
+    "    fp32_model.save_pretrained(fp32_model_dir)\n",
+    "    fp32_model_size = get_model_size(fp32_model_dir, \"pytorch\")\n",
+    "\n",
+    "with tempfile.TemporaryDirectory() as int8_model_dir:\n",
+    "    int8_model.save_pretrained(int8_model_dir)\n",
+    "    int8_model_size = get_model_size(int8_model_dir, \"openvino\")\n",
+    "\n",
+    "print(f\"FP32 model size: {fp32_model_size:.2f} MB\")\n",
+    "print(f\"INT8 model size: {int8_model_size:.2f} MB\")\n",
+    "print(f\"INT8 size decrease: {fp32_model_size / int8_model_size:.2f}x\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}