Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support for multiple field mappings in a single text-image embedding processor #476

Open
martin-gaievski opened this issue Oct 26, 2023 · 2 comments
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement neural-search

Comments

@martin-gaievski
Copy link
Member

martin-gaievski commented Oct 26, 2023

Is your feature request related to a problem?

Currently neural-search text_image_processor allows a single document field to be defined for each image and text mapping. A single field can be defined to store embedding in OpenSearch. Example of processor definition:

{
    "description": "An example neural search pipeline",
    "processors": [
        {
            "text_image_embedding": {
                "model_id": "1234567890",
                "embedding": "vector_embedding",
                "field_map": {
                    "text": "caption",
                    "image": "field_with_image"
                }
            }
        }
    ]
}

What solution would you like?

It should be possible to define multiple field pairs for image, text or image+text. It should be possible to define an OpenSearch field that stores embedding for a model. Request may look something like:

{
  "description": "An example neural search pipeline",
  "processors" : [
    {
      "text-image-embedding": {
        "model_id": "some_remote_model",
        "field_map": {
            "multimodal_embedding_1": {                                                
                 "text": "caption_1",
                 "image": "field_with_image_1"
            },
            "multimodal_embedding_2": {                                                
                 "text": "caption_2",
                 "image": "field_with_image_2"
            }
        }
    }
  ]
}

What alternatives have you considered?

Today it's possible to define multiple embedding processors as part of a single pipeline, and each processor may have it's own definition of mapping and embedding field.

{
    "description": "An example neural search pipeline",
    "processors": [
        {
            "text_image_embedding": {
                "model_id": "1234567890",
                "embedding": "vector_embedding_1",
                "field_map": {
                    "text": "caption_1",
                    "image": "field_with_image_1"
                }
            }
        },
        {
            "text_image_embedding": {
                "model_id": "1234567890",
                "embedding": "vector_embedding_2",
                "field_map": {
                    "text": "caption_2",
                    "image": "field_with_image_2"
                }
            }
        }
    ]
}

Do you have any additional context?

@martin-gaievski martin-gaievski added Features Introduces a new unit of functionality that satisfies a requirement untriaged enhancement labels Oct 26, 2023
@navneet1v
Copy link
Collaborator

@martin-gaievski It would be great if you can provide some details why the alternative approach which is provided is not feasible.

@vamshin vamshin removed the untriaged label Oct 30, 2023
@martin-gaievski
Copy link
Member Author

Alternative with multiple processors is harder to maintain then a single processor entry. For instance, if one needs to update model id it should be done in X places vs a single place. Another concern is the performance, having a multiple processors adding some overhead that may be critical in systems with low SLA for search response.

@martin-gaievski martin-gaievski changed the title [FEATURE] Multiple embeddings in one data ingestion request [FEATURE] Support for multiple field mappings in a single text-image embedding processor Nov 19, 2024
@heemin32 heemin32 moved this to Backlog in Neural Search RoadMap Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement neural-search
Projects
Status: Backlog
Development

No branches or pull requests

4 participants