cr

run-llama · Aug 16, 2024 · 935d5fc · 935d5fc
1 parent 4689c86
commit 935d5fc
Showing 1 changed file with 28 additions and 106 deletions.
diff --git a/examples/multimodal/multimodal_report_generation_agent.ipynb b/examples/multimodal/multimodal_report_generation_agent.ipynb
@@ -859,7 +859,7 @@
     {
      "data": {
       "text/markdown": [
-       "## Analysis of LoftQ Experimental Techniques"
+       "The LoftQ experimental techniques are designed to evaluate the effectiveness of quantization methods on various models and datasets. The experiments are conducted on both Natural Language Understanding (NLU) and Natural Language Generation (NLG) tasks, using models such as DeBERTaV3-base, BART-large, and LLAMA-2 series."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -871,7 +871,8 @@
     {
      "data": {
       "text/markdown": [
-       "The LoftQ method has been evaluated through a series of experiments on both Natural Language Understanding (NLU) and Natural Language Generation (NLG) tasks. The experiments were conducted using various models, including DeBERTaV3-base, BART-large, and LLAMA-2 series."
+       "### Implementation Details\n",
+       "The implementation of LoftQ is based on the Huggingface Transformers code-base. All experiments are conducted on NVIDIA A100 GPUs. The models are quantized using two methods: Uniform quantization and NF4/NF2 quantization, which are compatible with different quantization functions."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -883,7 +884,9 @@
     {
      "data": {
       "text/markdown": [
-       "### Implementation Details"
+       "### Quantization Methods\n",
+       "1. **Uniform Quantization**: This method uniformly divides a continuous interval into 2N categories and stores a local maximum absolute value for dequantization.\n",
+       "2. **NF4 and NF2 Quantization**: These methods assume that high-precision values are drawn from a Gaussian distribution and map these values to discrete slots with equal probability."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -895,7 +898,8 @@
     {
      "data": {
       "text/markdown": [
-       "The implementation of LoftQ is based on the Huggingface Transformers code-base. All experiments were conducted on NVIDIA A100 GPUs. The method involves freezing all backbone weight matrices and adding low-rank adapters to the weight matrices in Multi-Head Attention (MHA) and Feed-Forward Network (FFN) layers. The weight matrices attached by low-rank adapters are quantized."
+       "### Experimental Setup\n",
+       "The experiments involve quantizing the weight matrices attached by low-rank adapters in the models. The quantized models and adapters are available on Huggingface. The compression ratios achieved are 25-30% for 2-bit quantization and 15-20% for 4-bit quantization."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -907,7 +911,11 @@
     {
      "data": {
       "text/markdown": [
-       "### Quantization Methods"
+       "### Baselines\n",
+       "The performance of LoftQ is compared with several baseline methods:\n",
+       "1. **Full Fine-tuning**: All parameters are updated through an SGD-type optimization method.\n",
+       "2. **Full Precision LoRA (LoRA)**: Stores the backbone using 16-bit numbers and optimizes the low-rank adapters only.\n",
+       "3. **QLoRA**: Similar to LoRA but the backbone is quantized into a low-bit regime."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -919,10 +927,8 @@
     {
      "data": {
       "text/markdown": [
-       "Two quantization methods were applied to demonstrate the compatibility of LoftQ with different quantization functions:\n",
-       "\n",
-       "1. **Uniform Quantization**: This method uniformly divides a continuous interval into 2^N categories and stores a local maximum absolute value for dequantization.\n",
-       "2. **NF4 and NF2**: These methods assume that high-precision values are drawn from a Gaussian distribution and map these values to discrete slots with equal probability."
+       "### Results\n",
+       "The results show that LoftQ achieves significant improvements in various tasks. For instance, LoftQ using mixed-precision quantization yields a 4.1% accuracy boost on the GSM8K dataset using LLAMA-2-7b and a 4.7% boost using LLAMA-2-13b. Additionally, LoftQ outperforms state-of-the-art pruning methods on the DeBERTaV3-base model."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -934,7 +940,8 @@
     {
      "data": {
       "text/markdown": [
-       "### Experimental Setup"
+       "### Hyper-parameter Setup\n",
+       "The hyper-parameters for different tasks and models are meticulously chosen to optimize performance. For example, the learning rate for training DeBERTaV3-base on the GLUE benchmark using NF2 quantization varies from 1 × 10⁻⁴ to 5 × 10⁻⁵ across different tasks."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -946,71 +953,8 @@
     {
      "data": {
       "text/markdown": [
-       "#### Natural Language Understanding (NLU)\n",
-       "\n",
-       "The experiments on NLU tasks involved quantizing the DeBERTaV3-base model and evaluating it on the GLUE benchmark, SQuADv1.1, and ANLI datasets. The hyper-parameters for training DeBERTaV3-base using NF2 quantization are detailed in the experimental tables. The results showed that LoftQ significantly outperforms state-of-the-art pruning methods and achieves performance close to full fine-tuning with 4-bit quantization."
-      ],
-      "text/plain": [
-       "<IPython.core.display.Markdown object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/markdown": [
-       "#### Natural Language Generation (NLG)\n",
-       "\n",
-       "For NLG tasks, the experiments were conducted on WikiText-2 and GSM8K datasets using the LLAMA-2 series models. The results indicated that mixed-precision quantization (equivalent to 3 bits) provided a notable accuracy boost on the GSM8K dataset. The hyper-parameters for training BART-large on CNN/DailyMail and XSum datasets were also provided."
-      ],
-      "text/plain": [
-       "<IPython.core.display.Markdown object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/markdown": [
-       "### Results and Analysis"
-      ],
-      "text/plain": [
-       "<IPython.core.display.Markdown object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/markdown": [
-       "The results of LoftQ using 2-bits uniform quantization were compared with LoSparse on GLUE development sets. The method showed substantial improvements in accuracy and perplexity metrics across various datasets. Additionally, the effectiveness of alternating optimization was verified, showing that even minimal alternating steps yield significant improvements."
-      ],
-      "text/plain": [
-       "<IPython.core.display.Markdown object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/markdown": [
-       "### Conclusion"
-      ],
-      "text/plain": [
-       "<IPython.core.display.Markdown object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/markdown": [
-       "The experimental techniques used to evaluate LoftQ demonstrate its effectiveness in both NLU and NLG tasks. The method's compatibility with different quantization functions and its ability to achieve high performance with low-bit quantization make it a promising approach for model compression and efficient training."
+       "### Conclusion\n",
+       "The LoftQ experimental techniques demonstrate the potential of low-bit quantization methods in enhancing model performance while reducing memory usage. The experiments validate the effectiveness of LoftQ across various models and tasks, making it a promising approach for future research in model quantization."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -1111,31 +1055,9 @@
     {
      "data": {
       "text/markdown": [
-       "1. **Purpose**: Both LongLoRA and LoftQ aim to improve the efficiency and performance of large language models (LLMs) by optimizing their fine-tuning processes."
-      ],
-      "text/plain": [
-       "<IPython.core.display.Markdown object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/markdown": [
-       "2. **Efficiency**: Both methods focus on reducing computational costs and memory usage during fine-tuning, making them suitable for resource-constrained environments."
-      ],
-      "text/plain": [
-       "<IPython.core.display.Markdown object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/markdown": [
-       "3. **Compatibility**: Both techniques are designed to be compatible with existing LLM architectures and can be integrated with other optimization methods."
+       "1. **Purpose**: Both LongLoRA and LoftQ aim to improve the efficiency and performance of large language models (LLMs) during fine-tuning and inference.\n",
+       "2. **Efficiency**: Both methods focus on reducing computational costs and memory usage, making it feasible to work with large models on limited hardware resources.\n",
+       "3. **Fine-Tuning**: Both approaches involve fine-tuning pre-trained models to adapt them to specific tasks, although they use different techniques to achieve this."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -1160,8 +1082,8 @@
      "data": {
       "text/markdown": [
        "1. **Techniques**:\n",
-       "   - **LongLoRA**: Utilizes a combination of shifted sparse attention (S²-Attn) and low-rank adaptation (LoRA) to extend the context window of LLMs efficiently. It focuses on fine-tuning with sparse local attention during training and dense global attention during inference.\n",
-       "   - **LoftQ**: Introduces a quantization framework that integrates low-rank approximation with quantization to jointly approximate the original high-precision pre-trained weights. It aims to provide a better initialization for LoRA fine-tuning by mitigating the discrepancy introduced by quantization."
+       "   - **LongLoRA**: Utilizes a combination of shifted sparse attention (S²-Attn) and low-rank adaptation (LoRA) to extend the context window of LLMs efficiently. It focuses on enabling long-context processing by approximating full attention with sparse attention during training.\n",
+       "   - **LoftQ**: Introduces a quantization framework that integrates low-rank approximation with quantization to provide a better initialization for LoRA fine-tuning. It aims to mitigate the discrepancy between quantized and full-precision models, especially in low-bit quantization scenarios."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -1174,8 +1096,8 @@
      "data": {
       "text/markdown": [
        "2. **Experimental Results**:\n",
-       "   - **LongLoRA**: Demonstrates strong empirical results on various tasks using Llama2 models, extending context lengths significantly while maintaining performance. For example, it extends Llama2 7B from 4k context to 100k on a single 8× A100 machine.\n",
-       "   - **LoftQ**: Shows superior performance in low-bit quantization regimes, particularly in 2-bit and mixed precision scenarios. It consistently outperforms QLoRA across different tasks and models, achieving notable improvements in metrics like Rouge-1 for summarization tasks and accuracy for NLU tasks."
+       "   - **LongLoRA**: Demonstrates strong empirical results on various tasks with Llama2 models, extending context lengths significantly (e.g., Llama2 7B from 4k to 100k context). It shows improvements in training speed and memory efficiency while maintaining performance comparable to full fine-tuning.\n",
+       "   - **LoftQ**: Shows consistent improvements over QLoRA across different precision levels and tasks, including NLU, question answering, summarization, and NLG. It excels particularly in low-bit scenarios, achieving better performance and stability compared to QLoRA."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -1199,7 +1121,7 @@
     {
      "data": {
       "text/markdown": [
-       "Both LongLoRA and LoftQ offer innovative solutions for enhancing the efficiency and performance of LLMs. While LongLoRA focuses on extending context windows through sparse attention mechanisms, LoftQ emphasizes improving quantization processes to provide better initializations for fine-tuning. Their complementary approaches highlight the diverse strategies available for optimizing LLMs in various applications."
+       "Both LongLoRA and LoftQ offer innovative solutions to enhance the efficiency and performance of LLMs. LongLoRA focuses on extending context windows using sparse attention and low-rank adaptation, while LoftQ combines quantization with low-rank approximation to improve fine-tuning in low-bit precision regimes. Their techniques and experimental results highlight their respective strengths in handling large-scale models efficiently."
       ],
       "text/plain": [
        "<IPython.core.display.Markdown object>"
@@ -1230,7 +1152,7 @@
    "source": [
     "from llama_index.utils.workflow import draw_most_recent_execution\n",
     "\n",
-    "draw_most_recent_execution(agent)"
+    "draw_most_recent_execution(agent, notebook=False)"
    ]
   }
  ],