Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Azure openai embedding #17582

Open
krupalsmart97 opened this issue Jan 22, 2025 · 6 comments
Open

[Bug]: Azure openai embedding #17582

krupalsmart97 opened this issue Jan 22, 2025 · 6 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@krupalsmart97
Copy link

Bug Description

i am trying to index documents to azure ai search using llama index and using azure openai embedding model

self.openai_emb = AzureOpenAIEmbedding(
model="text-embedding-3-large",
deployment_name="text-embedding-3-large",
api_key=settings.openai_emb_api_key,
azure_endpoint=settings.openai_emb_base_url,
api_version=settings.openai_emb_api_version,
)

index = VectorStoreIndex(
nodes=text_nodes,
storage_context=storage_context,
embed_model=embed_model,
llm=llm,
show_progress=True,
)

but getting error

llama_index.embeddings.openai.base.OpenAIEmbedding._get_text_embeddings.._retryable_get_embeddings in 1.0 seconds as it raised APIConnectionError: Connection error..

Version

0.12.11

Steps to Reproduce

https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/e2e-demos/azure-ai-search-llamaindex-multimodal-rag-demo.ipynb

Relevant Logs/Tracbacks

@krupalsmart97 krupalsmart97 added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jan 22, 2025
Copy link

dosubot bot commented Jan 22, 2025

I found some similar threads that might help you with the connection error you're experiencing:

  1. Connection error on embedding - This discussion is unsolved but suggests implementing a robust retry mechanism with exponential backoff to handle transient connection errors. You can find an example of how to do this in the discussion.

  2. Issue #15238 - This open issue suggests enabling detailed logging to gain more insights into the cause of the intermittent connection errors. It also provides a code snippet for setting up logging in your application.

  3. Issue #11802 - This open issue suggests ensuring that your Azure OpenAI API connection settings are correct, including environment variables for the API version, endpoint, and API key. It also mentions verifying the model version of the deployment and using the correct API version.

These threads might provide some insights or solutions to your problem.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@krupalsmart97
Copy link
Author

@dosu the above links are not solving my issue

Copy link

dosubot bot commented Jan 22, 2025

To resolve the APIConnectionError when using Azure OpenAI embedding with LlamaIndex, ensure the following configurations are correct:

  1. Azure Endpoint: Verify that the azure_endpoint is correctly set in the format https://YOUR_RESOURCE_NAME.openai.azure.com/. This can be set using the environment variable AZURE_OPENAI_ENDPOINT.

  2. API Version: Ensure the api_version is specified correctly. You can set this using the environment variable OPENAI_API_VERSION.

  3. API Key: Confirm that the api_key is provided either directly or through the environment variable AZURE_OPENAI_API_KEY.

  4. Azure Deployment: Check that the azure_deployment is set correctly. This can be resolved from aliases or set directly.

  5. Azure AD Token: If using Microsoft Entra ID (formerly Azure AD) for token authentication, ensure the azure_ad_token_provider is correctly configured and that use_azure_ad is set to True.

Here's a code snippet for initializing the AzureOpenAIEmbedding class:

from llama_index.embeddings.azure_openai.base import AzureOpenAIEmbedding

embedding = AzureOpenAIEmbedding(
    azure_endpoint="https://YOUR_RESOURCE_NAME.openai.azure.com/",
    api_version="YOUR_API_VERSION",
    api_key="YOUR_API_KEY",
    azure_deployment="YOUR_AZURE_DEPLOYMENT",
    use_azure_ad=False,  # Set to True if using Azure AD
    # other parameters as needed
)

Replace YOUR_RESOURCE_NAME, YOUR_API_VERSION, YOUR_API_KEY, and YOUR_AZURE_DEPLOYMENT with your actual Azure resource name, API version, API key, and deployment name, respectively. If using Azure AD, ensure the token provider is correctly set up [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@krupalsmart97
Copy link
Author

@dosu tried and checked same issue
below are some log

File "/app/app/apis/components/datastore/azure_datastore.py", line 65, in add_data_to_brain docai_celery_worker | index.insert_nodes(text_nodes) docai_celery_worker | │ │ └ [TextNode(id_='f9c76897-e1a6-4b68-83f7-439bc64a187f', embedding=None, metadata={'pdf_name': 'RSA_OnDemand_PIN_Reset.pdf', 'pa... docai_celery_worker | │ └ <function VectorStoreIndex.insert_nodes at 0x7f4d5cd509a0> docai_celery_worker | └ <llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f4d5d10bc90> docai_celery_worker | docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 333, in insert_nodes docai_celery_worker | self.insert(nodes, **insert_kwargs) docai_celery_worker | │ │ │ └ {} docai_celery_worker | │ │ └ [TextNode(id='f9c76897-e1a6-4b68-83f7-439bc64a187f', embedding=None, metadata={'pdf_name': 'RSA_OnDemand_PIN_Reset.pdf', 'pa... docai_celery_worker | │ └ <function VectorStoreIndex._insert at 0x7f4d5cd50900> docai_celery_worker | └ <llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f4d5d10bc90> docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 314, in _insert docai_celery_worker | self._add_nodes_to_index(self.index_struct, nodes, **insert_kwargs) docai_celery_worker | │ │ │ │ │ └ {} docai_celery_worker | │ │ │ │ └ [TextNode(id='f9c76897-e1a6-4b68-83f7-439bc64a187f', embedding=None, metadata={'pdf_name': 'RSA_OnDemand_PIN_Reset.pdf', 'pa... docai_celery_worker | │ │ │ └ IndexDict(index_id='3d056469-8ed4-4007-bb29-7ac254baefbc', summary=None, nodes_dict={}, doc_id_dict={}, embeddings_dict={}) docai_celery_worker | │ │ └ <llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f4d5d10bc90> docai_celery_worker | │ └ <function VectorStoreIndex._add_nodes_to_index at 0x7f4d5cd50720> docai_celery_worker | └ <llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f4d5d10bc90> docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 232, in _add_nodes_to_index docai_celery_worker | nodes_batch = self.get_node_with_embedding(nodes_batch, show_progress) docai_celery_worker | │ │ │ └ False docai_celery_worker | │ │ └ [TextNode(id='f9c76897-e1a6-4b68-83f7-439bc64a187f', embedding=None, metadata={'pdf_name': 'RSA_OnDemand_PIN_Reset.pdf', 'pa... docai_celery_worker | │ └ <function VectorStoreIndex._get_node_with_embedding at 0x7f4d5cd50540> docai_celery_worker | └ <llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f4d5d10bc90> docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 139, in _get_node_with_embedding docai_celery_worker | id_to_embed_map = embed_nodes( docai_celery_worker | └ <function embed_nodes at 0x7f4d5d0d63e0> docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/utils.py", line 160, in embed_nodes docai_celery_worker | new_embeddings = embed_model.get_text_embedding_batch( docai_celery_worker | │ └ <FunctionWrapper at 0x7f4d5d806960 for function at 0x7f4d5d594f40> docai_celery_worker | └ OpenAIEmbedding(model_name='text-embedding-ada-002', embed_batch_size=100, callback_manager=<llama_index.core.callbacks.base.... docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 321, in wrapper docai_celery_worker | result = func(*args, **kwargs) docai_celery_worker | │ │ └ {'show_progress': False} docai_celery_worker | │ └ (['pdf_name: RSA_OnDemand_PIN_Reset.pdf\npage_no: 0\nimage_path: /app/Documents/d69f1745-7f43-45eb-8441-71e5c6ad5f59/images/R... docai_celery_worker | └ <bound method BaseEmbedding.get_text_embedding_batch of OpenAIEmbedding(model_name='text-embedding-ada-002', embed_batch_size... docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/llama_index/core/base/embeddings/base.py", line 335, in get_text_embedding_batch docai_celery_worker | embeddings = self._get_text_embeddings(cur_batch) docai_celery_worker | │ │ └ ['pdf_name: RSA_OnDemand_PIN_Reset.pdf\npage_no: 0\nimage_path: /app/Documents/d69f1745-7f43-45eb-8441-71e5c6ad5f59/images/RS... docai_celery_worker | │ └ <function OpenAIEmbedding._get_text_embeddings at 0x7f4d5d7032e0> docai_celery_worker | └ OpenAIEmbedding(model_name='text-embedding-ada-002', embed_batch_size=100, callback_manager=<llama_index.core.callbacks.base.... docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/llama_index/embeddings/openai/base.py", line 465, in _get_text_embeddings docai_celery_worker | return _retryable_get_embeddings() docai_celery_worker | └ <function OpenAIEmbedding._get_text_embeddings.._retryable_get_embeddings at 0x7f4d869b9800> docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/tenacity/init.py", line 336, in wrapped_f docai_celery_worker | return copy(f, *args, **kw) docai_celery_worker | │ │ │ └ {} docai_celery_worker | │ │ └ () docai_celery_worker | │ └ <function OpenAIEmbedding._get_text_embeddings.._retryable_get_embeddings at 0x7f4d869b96c0> docai_celery_worker | └ <Retrying object at 0x7f4d86c23a90 (stop=<tenacity.stop.stop_any object at 0x7f4d86c20810>, wait=<tenacity.wait.wait_random_e... docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/tenacity/init.py", line 475, in call docai_celery_worker | do = self.iter(retry_state=retry_state) docai_celery_worker | │ │ └ <RetryCallState 139970950086608: attempt #2; slept for 1.0; last result: failed (APIConnectionError Connection error.)> docai_celery_worker | │ └ <function BaseRetrying.iter at 0x7f4d5c38b920> docai_celery_worker | └ <Retrying object at 0x7f4d86c23a90 (stop=<tenacity.stop.stop_any object at 0x7f4d86c20810>, wait=<tenacity.wait.wait_random_e... docai_celery_worker | File "/usr/local/lib/python3.11/site-packages/tenacity/init.py", line 376, in iter

Copy link

dosubot bot commented Jan 22, 2025

The APIConnectionError you're encountering suggests a network-related issue when trying to connect to the Azure OpenAI service. Here are some steps to help diagnose and potentially resolve the issue:

  1. Network Configuration: Ensure that your network allows outbound connections to the Azure OpenAI endpoint. Check for any firewall or proxy settings that might be blocking the connection.

  2. Endpoint and API Key: Double-check that the azure_endpoint and api_key are correctly configured. The endpoint should be in the format https://YOUR_RESOURCE_NAME.openai.azure.com/.

  3. Retry Mechanism: The LlamaIndex uses the tenacity library for retries. Ensure that the retry settings are appropriate for your use case. You might want to increase the number of retries or adjust the backoff strategy to handle transient network issues more effectively.

  4. Logging: Enable detailed logging to capture more information about the connection attempts. This can help identify if there are specific patterns or errors occurring during the connection process.

  5. Environment Variables: Verify that all necessary environment variables are set correctly, especially if you're using them to configure the API key and endpoint.

  6. Azure Service Status: Check the Azure status page to ensure there are no ongoing service disruptions that might be affecting connectivity.

If these steps do not resolve the issue, consider testing the connection from a different network or environment to rule out local network issues.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@logan-markewich
Copy link
Collaborator

@krupalsmart97 seems like either your azure setup is not correct, or you are not passing the correct embed model to your index.

I see self.openai_emb = AzureOpenAIEmbedding(...), but when you create your index, you aren't using self.openai_emb ?

You can also test your embed model directly to ensure it works
print(self.openai_emb.get_text_embedding("hello world")[:10])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants