Skip to content

Commit

Permalink
address comments
Browse files Browse the repository at this point in the history
  • Loading branch information
init27 committed Nov 20, 2024
1 parent c9df1be commit 9950b75
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions recipes/quickstart/Multi-Modal-RAG/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,8 @@ This is a complete workshop on labelling images using the new Llama 3.2-Vision M
Before we start:

1. Please grab your HF CLI Token from [here](https://huggingface.co/settings/tokens)
2. git clone [this dataset](https://huggingface.co/datasets/Sanyam/MM-Demo) inside the Multi-Modal-RAG folder: `git clone https://huggingface.co/datasets/Sanyam/MM-Demo`
3. Launch jupyter notebook inside this folder
4. We will also run two scripts after the notebooks
5. Make sure you grab a together.ai token [here](https://www.together.ai)
2. Git clone [this dataset](https://huggingface.co/datasets/Sanyam/MM-Demo) inside the Multi-Modal-RAG folder: `git clone https://huggingface.co/datasets/Sanyam/MM-Demo`
3. Make sure you grab a together.ai token [here](https://www.together.ai)

## Detailed Outline for running:

Expand All @@ -32,6 +30,8 @@ Here's the detailed outline:

### Step 1: Data Prep and Synthetic Labeling:

In this step we start with an unlabelled dataset and use the image captioning capability of the model to write a description of the image and categorise it.

[Notebook for Step 1](./notebooks/Part_1_Data_Preperation.ipynb) and [Script for Step 1](./scripts/label_script.py)

To run the script (remember to set n):
Expand All @@ -46,9 +46,9 @@ The dataset consists of 5000 images with some meta-data.

The first half is preparing the dataset for labeling:
- Clean/Remove corrupt images
- EDA to understand existing distribution
- Some exploratory analysis to understand existing distribution
- Merging up categories of clothes to reduce complexity
- Balancing dataset by randomly sampling images
- Balancing dataset by randomly sampling images to have an equal distribution for retrieval

Second Half consists of Labeling the dataset. Llama 3.2, 11B model can only process one image at a time:
- We load a few images and test captioning
Expand All @@ -61,9 +61,9 @@ After running the script on the entire dataset, we have more data cleaning to pe

[Notebook for Step 2](./notebooks/Part_2_Cleaning_Data_and_DB.ipynb)

Even after our lengthy (apart from other things) prompt, the model still hallucinates categories and label, here is how we address this
We notice that even after some fun prompt engineering, the model faces some hallucinations-there are some issues with the JSON formatting and we notice that it hallucinates the label categories. Here is how we address this:

- Re-balance the dataset by mapping correct categories
- Re-balance the dataset by mapping correct categories. This is useful to make sure we have an equal distribution in our dataset for retrieval
- Fix Descriptions so that we can create a CSV

Now, we are ready to try our vector db pipeline:
Expand All @@ -73,13 +73,13 @@ Now, we are ready to try our vector db pipeline:
[Notebook for Step 3](./notebooks/Part_3_RAG_Setup_and_Validation.ipynb) and [Final Demo Script](./scripts/label_script.py)


With the cleaned descriptions and dataset, we can now store these in a vector-db
With the cleaned descriptions and dataset, we can now store these in a vector-db, here's the steps:

You will note that we are not using the categorization from our model-this is by design to show how RAG can simplify a lot of things.

- We create embeddings using the text description of our clothes
- Use 11-B model to describe the uploaded image
- Try to find similar or complimentary images based on the upload
- Ask the model to suggest complementary items to the upload
- Try to find similar or complementary images based on the upload

We try the approach with different retrieval methods.

Expand All @@ -96,7 +96,7 @@ python scripts/final_demo.py \
--use_existing_table
```

Task: We can further improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrieval of "similar" clothes instead of "complementary" items
Note: We can further improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrieval of "similar" clothes instead of "complementary" items

- Upload an image
- 11B model describes the image
Expand Down

0 comments on commit 9950b75

Please sign in to comment.