Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: remove username/password (can be part of URI) #43

Merged
merged 7 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 10 additions & 22 deletions actors/milvus/.actor/input_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,17 @@
"type": "object",
"schemaVersion": 1,
"properties": {
"milvusUrl": {
"title": "Milvus URL",
"milvusUri": {
"title": "Milvus URI",
"type": "string",
"description": "REST URL of the Milvus instance to connect to",
"description": "The URI of the Milvus instance to connect to. You can include the username and password in the URI, for example: `https://username:password@****.serverless.gcp-us-west1.cloud.zilliz.com`.",
"editor": "textfield",
"sectionCaption": "Milvus settings"
"sectionCaption": "Milvus settings",
"isSecret": true
},
"milvusApiKey": {
"title": "Milvus API KEY",
"description": "Milvus API KEY",
"milvusToken": {
"title": "Milvus Token",
"description": "Milvus Token",
"type": "string",
"editor": "textfield",
"isSecret": true
Expand All @@ -23,19 +24,6 @@
"description": "Name of the Milvus collection where the data will be stored",
"editor": "textfield"
},
"milvusUser": {
"title": "Milvus user name",
"type": "string",
"description": "User name for the Milvus cluster",
"editor": "textfield"
},
"milvusPassword": {
"title": "Milvus user password",
"type": "string",
"description": "Password for the Milvus cluster user",
"editor": "textfield",
"isSecret": true
},
"embeddingsProvider": {
"title": "Embeddings provider (as defined in the langchain API)",
"description": "Choose the embeddings provider to use for generating embeddings",
Expand Down Expand Up @@ -152,8 +140,8 @@
}
},
"required": [
"milvusUrl",
"milvusApiKey",
"milvusUri",
"milvusToken",
"milvusCollectionName",
"embeddingsProvider",
"embeddingsApiKey",
Expand Down
20 changes: 11 additions & 9 deletions actors/milvus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ It uses [LangChain](https://www.langchain.com/) to compute embeddings and intera

To utilize this integration, ensure you have:

- Created or existing `Milvus` database. You need to know `milvusUrl`, `milvusApiKey`, and `milvusCollectionName`.
- Created or existing `Milvus` database. You need to know `milvusUri`, `milvusToken`, and `milvusCollectionName`.
- If the collection does not exist, it will be created automatically.
- An account to compute embeddings using one of the providers, e.g., [OpenAI](https://platform.openai.com/docs/guides/embeddings) or [Cohere](https://docs.cohere.com/docs/cohere-embed).

Expand All @@ -53,11 +53,13 @@ For detailed input information refer to the [Input page](https://apify.com/apify
#### Database: Milvus
```json
{
"milvusUrl": "YOUR-MILVUS-URL",
"milvusApiKey": "YOUR-MILVUS-API-KEY",
"milvusUri": "YOUR-MILVUS-URI",
"milvusToken": "YOUR-MILVUS-TOKEN",
"milvusCollectionName": "YOUR-MILVUS-COLLECTION-NAME"
}
```
If you're using a username and password for authentication, you can include them in the `milvusUri` as follows:
`"milvusUri": "https://username:password@YOUR-MILVUS-URI"`.``

#### Embeddings provider: OpenAI
```json
Expand Down Expand Up @@ -172,8 +174,8 @@ This integration will save the selected fields from your Actor to Milvus and sto

```json
{
"milvusUrl": "YOUR-MILVUS-URL",
"milvusApiKey": "YOUR-MILVUS-API-KEY",
"milvusUri": "YOUR-MILVUS-URI",
"milvusToken": "YOUR-MILVUS-TOKEN",
"milvusCollectionName": "YOUR-MILVUS-COLLECTION-NAME",
"embeddingsApiKey": "YOUR-OPENAI-API-KEY",
"embeddingsConfig": {
Expand All @@ -195,17 +197,17 @@ This integration will save the selected fields from your Actor to Milvus and sto
#### Milvus
```json
{
"milvusUrl": "YOUR-MILVUS-URL",
"milvusApiKey": "YOUR-MILVUS-API-KEY",
"milvusUri": "YOUR-MILVUS-URI",
"milvusToken": "YOUR-MILVUS-TOKEN",
"milvusCollectionName": "YOUR-MILVUS-COLLECTION-NAME"
}
```

#### Managed Milvus service at [Zilliz](https://zilliz.com/)
```json
{
"milvusUrl": "https://in03-***********.api.gcp-us-west1.zillizcloud.com",
"milvusApiKey": "d46**********b4b",
"milvusUri": "https://in03-***********.api.gcp-us-west1.zillizcloud.com",
"milvusToken": "d46**********b4b",
"milvusCollectionName": "YOUR-MILVUS-COLLECTION-NAME"
}
```
Expand Down
1,758 changes: 920 additions & 838 deletions code/poetry.lock

Large diffs are not rendered by default.

18 changes: 9 additions & 9 deletions code/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ authors = ["[email protected]"]
description = ""
name = "store-vector-db"
readme = "README.md"
version = "0.0.1"
version = "0.1.4"
package-mode = false

[tool.poetry.dependencies]
Expand All @@ -13,10 +13,10 @@ apify-client = "^1.6.4"
openai = "^1.17.0"
python = ">=3.11,<3.12"
python-dotenv = "^1.0.1"
langchain-openai = "^0.1.6"
langchain-cohere = "^0.1.4"
langchain-community = "^0.2.0"
langchain-core = "0.2.10"
langchain-openai = "^0.2.0"
langchain-cohere = "^0.3.0"
langchain-community = "^0.3.0"
langchain-core = "0.3.5"

[tool.poetry.group.dev.dependencies]
coverage = "^7.5.4"
Expand All @@ -35,7 +35,7 @@ ruff = "^0.3.5"
optional = true

[tool.poetry.group.pinecone.dependencies]
langchain-pinecone = "^0.1.3"
langchain-pinecone = "^0.2.0"

[tool.poetry.group.chroma]
optional = true
Expand All @@ -54,21 +54,21 @@ langchain-qdrant = "^0.1.3"
optional = true

[tool.poetry.group.pgvector.dependencies]
langchain-postgres = "^0.0.8"
langchain-postgres = "^0.0.12"
psycopg = {extras = ["binary", "pool"], version = "^3.1.19"}
psycopg2-binary = "^2.9.9"

[tool.poetry.group.weaviate]
optional = true

[tool.poetry.group.weaviate.dependencies]
langchain-weaviate = "^0.0.2"
langchain-weaviate = "^0.0.3"

[tool.poetry.group.milvus]
optional = true

[tool.poetry.group.milvus.dependencies]
langchain-milvus = "^0.1.1"
langchain-milvus = "^0.1.5"

[tool.ruff]
line-length = 150
Expand Down
4 changes: 2 additions & 2 deletions code/src/examples/2024-07-08-milvus.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@

db = MilvusDatabase(
actor_input=MilvusIntegration(
milvusUrl=os.getenv("MILVUS_URL"),
milvusApiKey=os.getenv("MILVUS_API_KEY"),
milvusUri=os.getenv("MILVUS_URI"),
milvusToken=os.getenv("MILVUS_TOKEN"),
milvusCollectionName=MILVUS_COLLECTION_NAME,
embeddingsProvider=EmbeddingsProvider.OpenAI.value,
embeddingsApiKey=os.getenv("OPENAI_API_KEY"),
Expand Down
15 changes: 5 additions & 10 deletions code/src/examples/2024-09-09-milvus-actor.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,8 @@
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or "YOUR-OPENAI-API-KEY"

MILVUS_COLLECTION_NAME = "apify"
MILVUS_URL = os.getenv("MILVUS_URL") or "YOUR-MILVUS-URL"
MILVUS_API_KEY = os.getenv("MILVUS_API_KEY") or "YOUR-MILVUS-API-KEY"
MILVUS_USER = os.getenv("MILVUS_USER") or "YOUR-MILVUS-USER"
MILVUS_PASSWORD = os.getenv("MILVUS_PASSWORD") or "YOUR-MILVUS-PASSWORD"

MILVUS_URI = os.getenv("MILVUS_URI") or "YOUR-MILVUS-URI"
MILVUS_TOKEN = os.getenv("MILVUS_TOKEN") or "YOUR-MILVUS-TOKEN"

client = ApifyClient(APIFY_API_TOKEN)

Expand All @@ -51,11 +48,9 @@
print(actor_call)

milvus_integration_inputs = {
"milvusUrl": MILVUS_URL,
"milvusApiKey": MILVUS_API_KEY,
"milvusUri": MILVUS_URI,
"milvusToken": MILVUS_TOKEN,
"milvusCollectionName": MILVUS_COLLECTION_NAME,
"milvusUser": MILVUS_USER,
"milvusPassword": MILVUS_PASSWORD,
"datasetFields": ["text"],
"datasetId": actor_call["defaultDatasetId"],
"deltaUpdatesPrimaryDatasetFields": ["url"],
Expand All @@ -71,7 +66,7 @@

print("Question answering using Milvus/Zilliz database")
vectorstore = Milvus(
connection_args={"uri": MILVUS_URL, "token": MILVUS_API_KEY, "user": MILVUS_USER, "password": MILVUS_PASSWORD},
connection_args={"uri": MILVUS_URI, "token": MILVUS_TOKEN},
embedding_function=embeddings,
collection_name=MILVUS_COLLECTION_NAME,
)
Expand Down
22 changes: 7 additions & 15 deletions code/src/models/milvus_input_model.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# generated by datamodel-codegen:
# filename: input_schema.json
# timestamp: 2024-07-23T09:53:48+00:00
# timestamp: 2024-09-26T09:34:48+00:00

from __future__ import annotations

Expand All @@ -16,25 +16,17 @@ class EmbeddingsProvider(Enum):


class MilvusIntegration(BaseModel):
milvusUrl: str = Field(
milvusUri: str = Field(
...,
description='REST URL of the Milvus instance to connect to',
title='Milvus URL',
description='The URI of the Milvus instance to connect to. You can include the username and password in the URI, for example: `https://username:password@****.serverless.gcp-us-west1.cloud.zilliz.com`.',
title='Milvus URI',
)
milvusApiKey: str = Field(..., description='Milvus API KEY', title='Milvus API KEY')
milvusToken: str = Field(..., description='Milvus Token', title='Milvus Token')
milvusCollectionName: str = Field(
...,
description='Name of the Milvus collection where the data will be stored',
title='Milvus collection name',
)
milvusUser: Optional[str] = Field(
None, description='User name for the Milvus cluster', title='Milvus user name'
)
milvusPassword: Optional[str] = Field(
None,
description='Password for the Milvus cluster user',
title='Milvus user password',
)
embeddingsProvider: EmbeddingsProvider = Field(
...,
description='Choose the embeddings provider to use for generating embeddings',
Expand Down Expand Up @@ -92,12 +84,12 @@ class MilvusIntegration(BaseModel):
title='Delete expired objects from the database after a specified number of days',
)
performChunking: Optional[bool] = Field(
False,
True,
description='When set to true, the text will be divided into smaller chunks based on the settings provided below. Proper chunking helps optimize retrieval and ensures accurate and efficient responses.',
title='Enable text chunking',
)
chunkSize: Optional[int] = Field(
1000,
2000,
description='Defines the maximum number of characters in each text chunk. Choosing the right size balances between detailed context and system performance. Optimal sizes ensure high relevancy and minimal response time.',
ge=1,
title='Maximum chunk size',
Expand Down
2 changes: 1 addition & 1 deletion code/src/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def add_item_checksum(items: list[Document], dataset_fields_to_item_id: list[str
def add_chunk_id(chunks: list[Document]) -> list[Document]:
"""For every chunk (document stored in vector db) add chunk_id to metadata.

The chunk_id is a unique identifier for each chunk and is not required but it is better to keep it in metadata.
The chunk_id is a unique identifier for each chunk and is not required, but it is better to keep it in metadata.
"""
for d in chunks:
d.metadata["chunk_id"] = d.metadata.get("chunk_id", str(uuid4()))
Expand Down
10 changes: 3 additions & 7 deletions code/src/vector_stores/milvus.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,16 @@ class MilvusDatabase(Milvus, VectorDbBase):
def __init__(self, actor_input: MilvusIntegration, embeddings: Embeddings) -> None:
self.collection_name = actor_input.milvusCollectionName

connection_args = {"uri": actor_input.milvusUrl, "token": actor_input.milvusApiKey}

if actor_input.milvusUser and actor_input.milvusPassword:
connection_args |= {"user": actor_input.milvusUser, "password": actor_input.milvusPassword}

connection_args = {"uri": actor_input.milvusUri, "token": actor_input.milvusToken}
super().__init__(connection_args=connection_args, embedding_function=embeddings, collection_name=self.collection_name)
self.client = MilvusClient(**connection_args)
self._dummy_vector: list[float] = []

@property
def dummy_vector(self) -> list[float]:
if not self._dummy_vector and self.embeddings:
self._dummy_vector = self.embeddings.embed_query("dummy")
return self._dummy_vector
self._dummy_vector = self.embeddings.embed_query("dummy") # type: ignore
return self._dummy_vector # type: ignore

async def is_connected(self) -> bool:
raise NotImplementedError
Expand Down
Loading
Loading