Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Neural sparse tool, do refactor using AbstractRetrievalTool #1686

Conversation

zhichao-aws
Copy link
Member

Description

Add NeuralSparseTool implementation. Implement an abstract class AbstractRetrievalTool, making NeuralSparseTool and VectorDBTool extend this class.

Issues Resolved

#1675

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Nov 24, 2023

Codecov Report

Attention: 32 lines in your changes are missing coverage. Please review.

Comparison is base (145155e) 66.31% compared to head (6909355) 66.67%.

Files Patch % Lines
...nsearch/ml/engine/tools/AbstractRetrieverTool.java 74.19% 14 Missing and 2 partials ⚠️
...a/org/opensearch/ml/engine/tools/VectorDBTool.java 0.00% 11 Missing ⚠️
...g/opensearch/ml/engine/tools/NeuralSparseTool.java 86.48% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@                        Coverage Diff                        @@
##             feature/agent_framework_dev    #1686      +/-   ##
=================================================================
+ Coverage                          66.31%   66.67%   +0.36%     
- Complexity                          2551     2570      +19     
=================================================================
  Files                                239      241       +2     
  Lines                              12731    12781      +50     
  Branches                            1279     1283       +4     
=================================================================
+ Hits                                8442     8522      +80     
+ Misses                              3730     3694      -36     
- Partials                             559      565       +6     
Flag Coverage Δ
ml-commons 66.67% <71.42%> (+0.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhichao-aws
Copy link
Member Author

The failed IT on windows 11 seems already exists https://github.com/opensearch-project/ml-commons/actions/runs/6954022968/job/18920109040. Not introduced by this PR.

Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Copy link
Contributor

@arjunkumargiri arjunkumargiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a example agent using Neural sparse tool and attach sample request/response details to PR.

@Log4j2
@Getter
@Setter
public abstract class AbstractRetrieverTool implements Tool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the refactoring.

Signed-off-by: zhichao-aws <[email protected]>
@zhichao-aws
Copy link
Member Author

Please create a example agent using Neural sparse tool and attach sample request/response details to PR.

Sure, will post one soon

@zhichao-aws
Copy link
Member Author

  • prerequisites: finsh the steps in the doc, get an sparse encoding index and neural sparse model id; a llm model id

demo flow agent

register agent:

# request
POST /_plugins/_ml/agents/_register
{
  "name": "Test_Neural_Sparse_Agent_For_RAG",
  "type": "flow",
  "description": "this is a test agent",
  "tools": [
    {
      "type": "NeuralSparseTool",
      "parameters": {
        "model_id": "65fy8IsBzZuijRBEwRBX",
        "index": "sparse-index",
        "embedding_field": "sparse_embedding",
        "source_field": ["passage_text"],
        "input": "${parameters.question}"
      }
    },
    {
      "type": "MLModelTool",
      "description": "A general tool to answer any question",
      "parameters": {
        "model_id": "Gpct8YsBzZuijRBEkBIC",
        "prompt": "\n\nHuman:You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say don't know. \n\n Context:\n${parameters.NeuralSparseTool.output}\n\nHuman:${parameters.question}\n\nAssistant:"
      }
    }
  ]
}

# response
{
  "agent_id": "GQ4WH4wBIIa8ZwL88QuK"
}

execute agent:

# request
POST /_plugins/_ml/agents/GQ4WH4wBIIa8ZwL88QuK/_execute
{
    "parameters": {
        "question": "How many employee does company ABCD_TEST have?"
    }
}
# response
{
  "inference_results": [
    {
      "output": [
        {
          "result": " Based on the given context, company ABC_TEST has over 1000 employees."
        }
      ]
    }
  ]
}

demo ReACT agent

register

# request
POST /_plugins/_ml/agents/_register
{
  "name": "Test_Agent_For_ReAct",
  "type": "conversational",
  "description": "this is a test agent",
  "llm": {
    "model_id": "Gpct8YsBzZuijRBEkBIC",
    "parameters": {
      "max_iteration": 5,
      "stop_when_no_tool_found": true,
      "response_filter": "$.completion"
    }
  },
  "memory": {
    "type": "conversation_index"
  },
  "tools": [
    {
      "type": "NeuralSparseTool",
      "name": "NeuralSparseSearchTool",
      "description": "A tool to search the knowledge base, you should always try to search data with this tool when encountering questions you don't know. Action Input: <natrual language question>",
      "parameters": {
        "model_id": "65fy8IsBzZuijRBEwRBX",
        "index": "hybrid-index",
        "embedding_field": "sparse_embedding",
        "source_field": [ "passage_text" ],
        "input": "${parameters.question}"
      }
    },
    {
      "type": "CatIndexTool",
      "name": "RetrieveIndexMetaTool",
      "description": "Use this tool to get OpenSearch index information: (health, status, index, uuid, primary count, replica count, docs.count, docs.deleted, store.size, primary.store.size)."
    }
  ],
  "app_type": "my app"
}
# response
{
  "agent_id": "Mw4bH4wBIIa8ZwL8ewvQ"
}

execute

# request
POST /_plugins/_ml/agents/Mw4bH4wBIIa8ZwL8ewvQ/_execute
{
  "parameters": {
    "question": "How many employee does company ABCD_TEST have?",
    "verbose": true
  }
}
# response
{
  "inference_results": [
    {
      "output": [
        {
          "name": "memory_id",
          "result": "Nw4bH4wBIIa8ZwL8zwtw"
        },
        {
          "name": "response",
          "result": "Thought: I do not have enough information to directly answer how many employees company ABCD_TEST has. I should use the NeuralSparseSearchTool to search for this information."
        },
        {
          "name": "response",
          "result": "\nObservation: \"{\\\"_index\\\":\\\"hybrid-index\\\",\\\"_source\\\":{\\\"passage_text\\\":\\\"The history of company ABC_TEST is over 100 years. The company has over 1000 employees.\\\"},\\\"_id\\\":\\\"IA4YH4wBIIa8ZwL8ywsd\\\",\\\"_score\\\":130.33943}\\n{\\\"_index\\\":\\\"hybrid-index\\\",\\\"_source\\\":{\\\"passage_text\\\":\\\"Company xyz have a history of 100 years.\\\"},\\\"_id\\\":\\\"3\\\",\\\"_score\\\":19.313019}\\n\""
        },
        {
          "name": "response",
          "result": "After searching through the knowledge base, I have found the answer."
        },
        {
          "name": "response",
          "result": "Company ABCD_TEST has over 1000 employees."
        }
      ]
    }
  ]
}

@zhichao-aws
Copy link
Member Author

Hi @arjunkumargiri , I've resolved the comments. Please help review again

@zhichao-aws
Copy link
Member Author

@zane-neo @dbwiddis @arjunkumargiri Could you please help merge this PR? I don't have write access to this repo.

@zane-neo zane-neo merged commit d3f222d into opensearch-project:feature/agent_framework_dev Dec 4, 2023
4 of 7 checks passed
mingshl pushed a commit to mingshl/ml-commons that referenced this pull request Dec 20, 2023
…search-project#1686)

* add abstract retriever class

Signed-off-by: zhichao-aws <[email protected]>

* extends the abstract class, add neural sparse tool

Signed-off-by: zhichao-aws <[email protected]>

* add register logic

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* add test class

Signed-off-by: zhichao-aws <[email protected]>

* add test,spotless Apply

Signed-off-by: zhichao-aws <[email protected]>

* fix wrong ut name

Signed-off-by: zhichao-aws <[email protected]>

* add description

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* add _index and _id to retriever tool result; modify ut

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* remove set description from tool factory

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants