GraphMinds: Leveraging Large Language Models and Knowledge Graphs for Transparent and Efficient AI Systems

Abstract

"I am convinced that the crux of the problem of learning is recognising relationships and being able to use them."
— Christopher Strachey in a letter to Alan Turing, 1954

GraphMinds addresses the challenges of processing unstructured data and leveraging indirect relationships within sensitive domains. Traditional AI systems, particularly those reliant on cloud infrastructures, present security risks, making local, secure analysis increasingly critical.

Initially developed as a Local Retrieval-Augmented Generation (RAG) system, GraphMinds evolved to tackle the limitations of handling unstructured data by utilising knowledge graphs (KGs) to structure information. This enables the system to infer and represent indirect relationships, enhancing the capabilities of Large Language Models (LLMs) in processing fragmented and complex datasets.

GraphMinds integrates advanced graph-based techniques with LLMs, facilitating the capture of indirect relationships within a knowledge graph. This unique approach improves the system's ability to analyse unstructured data, offering deeper insights and enhanced security for knowledge-intensive tasks.

Evaluations demonstrate GraphMinds' superiority in analyzing large, unstructured datasets, particularly in fields requiring comprehensive analysis, such as criminal investigations. This innovation underscores its potential as a powerful tool for secure and transparent data analysis.

Key Features

Graph-Based Relationship Mapping: Extracts direct and indirect relationships between entities from unstructured data and represents them in a knowledge graph.
Secure Local AI Processing: Designed to operate securely in local environments, ensuring data confidentiality without reliance on cloud infrastructure.
Embeddings and Similarity Matching: Uses sentence embeddings to compute similarities between user queries and document relationships.
LLM Integration for Comprehensive Analysis: Integrates with advanced LLMs to generate human-readable answers from relationships and contextual data.

Technologies Used

Sentence Transformers: For generating sentence embeddings.
NetworkX and PyVis: For graph representation and visualization.
SciPy: For calculating cosine similarity between embeddings.
Ollama Client: For interacting with the LLM.
Pandas: For data manipulation and handling relationships.

Installation

1. Clone the repository:

git clone https://github.com/tirth8205/GraphMinds.git
cd GraphMinds

2. Create and activate the Conda environment from the `environment.yml` file:

conda env create -f environment.yml
conda activate graphminds

3. Verify the installation by checking the installed packages:

conda list

4. Setting Up Ollama

The Ollama API provides the necessary interface for interacting with the LLMs (e.g., mistral-openorca and zephyr:7b). The Zephyr model is a series of fine-tuned versions of the Mistral and Mixtral models that are trained to act as helpful assistants. Zephyr is a 7B parameter model, distributed under the Apache license, available in both instruction-following and text completion variants.

To use the Ollama API:

Download and install the Ollama 5.1.4 API from the official Ollama website.
After installing the API, verify the installation using the following command to check the version:
```
ollama version
```
Once installed, you can download the required models like mistral-openorca and zephyr:7b using Ollama's built-in commands:
```
ollama pull mistral-openorca
ollama pull zephyr:7b
```
Ensure that the models are correctly installed and ready for interaction:
```
ollama list
```
This will list all the available models that are ready to use with the project.

5. Launch JupyterLab (Optional):

If you're planning to work in JupyterLab, you can start it with:

jupyter lab

6. Deactivating the Environment:

Once you're done, you can deactivate the Conda environment by running:

conda deactivate

Notes:

If you need to install additional packages, you can do so within the activated environment using conda install or pip install.
The Python version is not fixed in the environment file, so the latest compatible version of Python will be installed when the environment is created.

Usage

Set Up the Environment: After setting up the environment, open the extract_graph.ipynb Jupyter notebook and ensure the kernel is set to Knowledge Graph (the environment you just created).

Prepare Data:

Place your PDF file in the input/ folder (PDFs only at the moment).
Open the notebook and update the file name in the script:

# Load the PDF document
loader = PyPDFLoader("input/#FileName")  # Replace #FileName with the name of your PDF file
documents = loader.load()  # Load the content of the PDF

Run the Notebook: Execute the cells to process the PDF and generate the knowledge graph. Once the processing is done, the system will generate an interactive HTML file for querying the relationships within the document.

Example Query Script: You can also run a predefined script to query the document:

# Example query
query = "#ENTER YOUR QUERY HERE"  # Replace with your query
response = answer_query_with_all_relationships(query, contentreplacedforchunk_dfg, df)

# Print the response
print(response)

Alternatively, Start a Chat: You can initiate an interactive chat as defined in the last cell of the notebook:

while True:
    # Get the user's query
    query = input("Ask your question: ").strip()
    
    # Check if the user wants to exit
    if query.lower() in ['exit', 'quit']:
        print("Ending the interaction. Goodbye!")
        break

The script will continuously prompt for questions until you type exit or quit.

System Architecture

Embedding Generation: Generates sentence embeddings for each relationship in the dataset by combining node, edge, and context data.
Cosine Similarity Matching: Matches user queries with the relationships in the dataset using cosine similarity.
Graph-Based Inference: Extracts direct and indirect relationships from the knowledge graph to provide context-rich answers.
LLM-Powered Answer Generation: Uses an LLM to create natural language answers based on the relationships and context.

Key Innovation

The core of GraphMinds lies in its ability to leverage knowledge graphs and integrate indirect relationships into its analysis. By combining LLMs with structured data from knowledge graphs, GraphMinds enhances the accuracy of insights generated from unstructured and fragmented information.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

This project is developed by Tirth Kanani under the supervision of Prof. Christopher Baber as part of the MSc program in Human-Computer Interaction at the University of Birmingham. Special thanks to the developers of tools such as Sentence Transformers, NetworkX, and PyVis for their invaluable contributions.

For more detailed insights and background on this project, you can access the full project report here.

GitHub Repository

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs		docs
helpers		helpers
lib		lib
ollama		ollama
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
extract_graph.ipynb		extract_graph.ipynb
finalcode.py		finalcode.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphMinds: Leveraging Large Language Models and Knowledge Graphs for Transparent and Efficient AI Systems

Abstract

Key Features

Technologies Used

Installation

1. Clone the repository:

2. Create and activate the Conda environment from the `environment.yml` file:

3. Verify the installation by checking the installed packages:

4. Setting Up Ollama

5. Launch JupyterLab (Optional):

6. Deactivating the Environment:

Notes:

Usage

System Architecture

Key Innovation

License

Acknowledgments

About

Releases

Packages

Languages

License

tirth8205/GraphMinds

Folders and files

Latest commit

History

Repository files navigation

GraphMinds: Leveraging Large Language Models and Knowledge Graphs for Transparent and Efficient AI Systems

Abstract

Key Features

Technologies Used

Installation

1. Clone the repository:

2. Create and activate the Conda environment from the environment.yml file:

3. Verify the installation by checking the installed packages:

4. Setting Up Ollama

5. Launch JupyterLab (Optional):

6. Deactivating the Environment:

Notes:

Usage

System Architecture

Key Innovation

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. Create and activate the Conda environment from the `environment.yml` file:

Packages