Skip to content

Commit

Permalink
Deploying to gh-pages from @ a35cb12 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
dworthen committed Jan 3, 2025
1 parent 8ed4ac1 commit 5a809a4
Show file tree
Hide file tree
Showing 6 changed files with 18 additions and 74 deletions.
2 changes: 1 addition & 1 deletion examples_notebooks/global_search/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -2248,7 +2248,7 @@ <h3 id="load-community-reports-as-context-for-global-search">Load community repo
<pre>
<span class="ansi-red-fg">---------------------------------------------------------------------------</span>
<span class="ansi-red-fg">AttributeError</span> Traceback (most recent call last)
<span class="ansi-green-fg">/tmp/ipykernel_2116/1512985616.py</span> in <span class="ansi-cyan-fg">?</span><span class="ansi-blue-fg">()</span>
<span class="ansi-green-fg">/tmp/ipykernel_2110/1512985616.py</span> in <span class="ansi-cyan-fg">?</span><span class="ansi-blue-fg">()</span>
<span class="ansi-green-intense-fg ansi-bold"> 2</span> entity_df <span class="ansi-blue-fg">=</span> pd<span class="ansi-blue-fg">.</span>read_parquet<span class="ansi-blue-fg">(</span><span class="ansi-blue-fg">f"{INPUT_DIR}/{ENTITY_TABLE}.parquet"</span><span class="ansi-blue-fg">)</span>
<span class="ansi-green-intense-fg ansi-bold"> 3</span> report_df <span class="ansi-blue-fg">=</span> pd<span class="ansi-blue-fg">.</span>read_parquet<span class="ansi-blue-fg">(</span><span class="ansi-blue-fg">f"{INPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet"</span><span class="ansi-blue-fg">)</span>
<span class="ansi-green-intense-fg ansi-bold"> 4</span> entity_embedding_df <span class="ansi-blue-fg">=</span> pd<span class="ansi-blue-fg">.</span>read_parquet<span class="ansi-blue-fg">(</span><span class="ansi-blue-fg">f"{INPUT_DIR}/{ENTITY_EMBEDDING_TABLE}.parquet"</span><span class="ansi-blue-fg">)</span>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2156,7 +2156,7 @@ <h3 id="load-community-reports-as-context-for-global-search">Load community repo
<pre>
<span class="ansi-red-fg">---------------------------------------------------------------------------</span>
<span class="ansi-red-fg">AttributeError</span> Traceback (most recent call last)
<span class="ansi-green-fg">/tmp/ipykernel_2146/2760368953.py</span> in <span class="ansi-cyan-fg">?</span><span class="ansi-blue-fg">()</span>
<span class="ansi-green-fg">/tmp/ipykernel_2143/2760368953.py</span> in <span class="ansi-cyan-fg">?</span><span class="ansi-blue-fg">()</span>
<span class="ansi-green-intense-fg ansi-bold"> 2</span> entity_df <span class="ansi-blue-fg">=</span> pd<span class="ansi-blue-fg">.</span>read_parquet<span class="ansi-blue-fg">(</span><span class="ansi-blue-fg">f"{INPUT_DIR}/{ENTITY_TABLE}.parquet"</span><span class="ansi-blue-fg">)</span>
<span class="ansi-green-intense-fg ansi-bold"> 3</span> report_df <span class="ansi-blue-fg">=</span> pd<span class="ansi-blue-fg">.</span>read_parquet<span class="ansi-blue-fg">(</span><span class="ansi-blue-fg">f"{INPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet"</span><span class="ansi-blue-fg">)</span>
<span class="ansi-green-intense-fg ansi-bold"> 4</span> entity_embedding_df <span class="ansi-blue-fg">=</span> pd<span class="ansi-blue-fg">.</span>read_parquet<span class="ansi-blue-fg">(</span><span class="ansi-blue-fg">f"{INPUT_DIR}/{ENTITY_EMBEDDING_TABLE}.parquet"</span><span class="ansi-blue-fg">)</span>
Expand Down
20 changes: 8 additions & 12 deletions examples_notebooks/index_migration/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -2324,9 +2324,8 @@ <h2 id="index-migration">Index Migration<a class="anchor-link" href="#index-migr
</div>
</clipboard-copy>
</div>
<div class="highlight-ipynb hl-python"><pre><span></span><span class="kn">from</span> <span class="nn">datashaper</span> <span class="kn">import</span> <span class="n">NoopVerbCallbacks</span>

<span class="kn">from</span> <span class="nn">graphrag.cache.factory</span> <span class="kn">import</span> <span class="n">create_cache</span>
<div class="highlight-ipynb hl-python"><pre><span></span><span class="kn">from</span> <span class="nn">graphrag.cache.factory</span> <span class="kn">import</span> <span class="n">create_cache</span>
<span class="kn">from</span> <span class="nn">graphrag.callbacks.noop_verb_callbacks</span> <span class="kn">import</span> <span class="n">NoopVerbCallbacks</span>
<span class="kn">from</span> <span class="nn">graphrag.index.flows.generate_text_embeddings</span> <span class="kn">import</span> <span class="n">generate_text_embeddings</span>

<span class="c1"># We only need to re-run the embeddings workflow, to ensure that embeddings for all required search fields are in place</span>
Expand Down Expand Up @@ -2355,9 +2354,8 @@ <h2 id="index-migration">Index Migration<a class="anchor-link" href="#index-migr
<span class="n">snapshot_embeddings_enabled</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<div class="clipboard-copy-txt" id="cell-7">from datashaper import NoopVerbCallbacks

from graphrag.cache.factory import create_cache
<div class="clipboard-copy-txt" id="cell-7">from graphrag.cache.factory import create_cache
from graphrag.callbacks.noop_verb_callbacks import NoopVerbCallbacks
from graphrag.index.flows.generate_text_embeddings import generate_text_embeddings

# We only need to re-run the embeddings workflow, to ensure that embeddings for all required search fields are in place
Expand Down Expand Up @@ -2399,12 +2397,10 @@ <h2 id="index-migration">Index Migration<a class="anchor-link" href="#index-migr
<pre>
<span class="ansi-red-fg">---------------------------------------------------------------------------</span>
<span class="ansi-red-fg">ImportError</span> Traceback (most recent call last)
Cell <span class="ansi-green-fg">In[7], line 3</span>
<span class="ansi-green-intense-fg ansi-bold"> 1</span> <span class="ansi-bold" style="color: rgb(0,135,0)">from</span> <span class="ansi-bold" style="color: rgb(0,0,255)">datashaper</span> <span class="ansi-bold" style="color: rgb(0,135,0)">import</span> NoopVerbCallbacks
<span class="ansi-green-fg">----&gt; 3</span> <span class="ansi-bold" style="color: rgb(0,135,0)">from</span> <span class="ansi-bold" style="color: rgb(0,0,255)">graphrag</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">cache</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">factory</span> <span class="ansi-bold" style="color: rgb(0,135,0)">import</span> create_cache
<span class="ansi-green-intense-fg ansi-bold"> 4</span> <span class="ansi-bold" style="color: rgb(0,135,0)">from</span> <span class="ansi-bold" style="color: rgb(0,0,255)">graphrag</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">index</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">flows</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">generate_text_embeddings</span> <span class="ansi-bold" style="color: rgb(0,135,0)">import</span> generate_text_embeddings
<span class="ansi-green-intense-fg ansi-bold"> 6</span> <span style="color: rgb(95,135,135)"># We only need to re-run the embeddings workflow, to ensure that embeddings for all required search fields are in place</span>
<span class="ansi-green-intense-fg ansi-bold"> 7</span> <span style="color: rgb(95,135,135)"># We'll construct the context and run this function flow directly to avoid everything else</span>
Cell <span class="ansi-green-fg">In[7], line 1</span>
<span class="ansi-green-fg">----&gt; 1</span> <span class="ansi-bold" style="color: rgb(0,135,0)">from</span> <span class="ansi-bold" style="color: rgb(0,0,255)">graphrag</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">cache</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">factory</span> <span class="ansi-bold" style="color: rgb(0,135,0)">import</span> create_cache
<span class="ansi-green-intense-fg ansi-bold"> 2</span> <span class="ansi-bold" style="color: rgb(0,135,0)">from</span> <span class="ansi-bold" style="color: rgb(0,0,255)">graphrag</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">callbacks</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">noop_verb_callbacks</span> <span class="ansi-bold" style="color: rgb(0,135,0)">import</span> NoopVerbCallbacks
<span class="ansi-green-intense-fg ansi-bold"> 3</span> <span class="ansi-bold" style="color: rgb(0,135,0)">from</span> <span class="ansi-bold" style="color: rgb(0,0,255)">graphrag</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">index</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">flows</span><span class="ansi-bold" style="color: rgb(0,0,255)">.</span><span class="ansi-bold" style="color: rgb(0,0,255)">generate_text_embeddings</span> <span class="ansi-bold" style="color: rgb(0,135,0)">import</span> generate_text_embeddings

<span class="ansi-red-fg">ImportError</span>: cannot import name 'create_cache' from 'graphrag.cache.factory' (/home/runner/work/graphrag/graphrag/graphrag/cache/factory.py)</pre>
</div>
Expand Down
66 changes: 7 additions & 59 deletions index/architecture/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -628,27 +628,9 @@
</li>

<li class="md-nav__item">
<a href="#datashaper-workflows" class="md-nav__link">
<a href="#workflows" class="md-nav__link">
<span class="md-ellipsis">
DataShaper Workflows
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#llm-based-workflow-steps" class="md-nav__link">
<span class="md-ellipsis">
LLM-based Workflow Steps
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#workflow-graphs" class="md-nav__link">
<span class="md-ellipsis">
Workflow Graphs
Workflows
</span>
</a>

Expand Down Expand Up @@ -1493,27 +1475,9 @@
</li>

<li class="md-nav__item">
<a href="#datashaper-workflows" class="md-nav__link">
<span class="md-ellipsis">
DataShaper Workflows
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#llm-based-workflow-steps" class="md-nav__link">
<a href="#workflows" class="md-nav__link">
<span class="md-ellipsis">
LLM-based Workflow Steps
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#workflow-graphs" class="md-nav__link">
<span class="md-ellipsis">
Workflow Graphs
Workflows
</span>
</a>

Expand Down Expand Up @@ -1566,24 +1530,8 @@ <h3 id="knowledge-model">Knowledge Model</h3>
<p>In order to support the GraphRAG system, the outputs of the indexing engine (in the Default Configuration Mode) are aligned to a knowledge model we call the <em>GraphRAG Knowledge Model</em>.
This model is designed to be an abstraction over the underlying data storage technology, and to provide a common interface for the GraphRAG system to interact with.
In normal use-cases the outputs of the GraphRAG Indexer would be loaded into a database system, and the GraphRAG's Query Engine would interact with the database using the knowledge model data-store types.</p>
<h3 id="datashaper-workflows">DataShaper Workflows</h3>
<p>GraphRAG's Indexing Pipeline is built on top of our open-source library, <a href="https://github.com/microsoft/datashaper">DataShaper</a>.
DataShaper is a data processing library that allows users to declaratively express data pipelines, schemas, and related assets using well-defined schemas.
DataShaper has implementations in JavaScript and Python, and is designed to be extensible to other languages.</p>
<p>One of the core resource types within DataShaper is a <a href="https://github.com/microsoft/datashaper/blob/main/javascript/schema/src/workflow/WorkflowSchema.ts">Workflow</a>.
Workflows are expressed as sequences of steps, which we call <a href="https://github.com/microsoft/datashaper/blob/main/javascript/schema/src/workflow/verbs.ts">verbs</a>.
Each step has a verb name and a configuration object.
In DataShaper, these verbs model relational concepts such as SELECT, DROP, JOIN, etc.. Each verb transforms an input data table, and that table is passed down the pipeline.</p>
<pre class="mermaid"><code>---
title: Sample Workflow
---
flowchart LR
input[Input Table] --&gt; select[SELECT] --&gt; join[JOIN] --&gt; binarize[BINARIZE] --&gt; output[Output Table]</code></pre>
<h3 id="llm-based-workflow-steps">LLM-based Workflow Steps</h3>
<p>GraphRAG's Indexing Pipeline implements a handful of custom verbs on top of the standard, relational verbs that our DataShaper library provides. These verbs give us the ability to augment text documents with rich, structured data using the power of LLMs such as GPT-4. We utilize these verbs in our standard workflow to extract entities, relationships, claims, community structures, and community reports and summaries. This behavior is customizable and can be extended to support many kinds of AI-based data enrichment and extraction tasks.</p>
<h3 id="workflow-graphs">Workflow Graphs</h3>
<p>Because of the complexity of our data indexing tasks, we needed to be able to express our data pipeline as series of multiple, interdependent workflows.
In the GraphRAG Indexing Pipeline, each workflow may define dependencies on other workflows, effectively forming a directed acyclic graph (DAG) of workflows, which is then used to schedule processing.</p>
<h3 id="workflows">Workflows</h3>
<p>Because of the complexity of our data indexing tasks, we needed to be able to express our data pipeline as series of multiple, interdependent workflows.</p>
<pre class="mermaid"><code>---
title: Sample Workflow DAG
---
Expand All @@ -1599,7 +1547,7 @@ <h3 id="dataframe-message-format">Dataframe Message Format</h3>
<p>The primary unit of communication between workflows, and between workflow steps is an instance of <code>pandas.DataFrame</code>.
Although side-effects are possible, our goal is to be <em>data-centric</em> and <em>table-centric</em> in our approach to data processing.
This allows us to easily reason about our data, and to leverage the power of dataframe-based ecosystems.
Our underlying dataframe technology may change over time, but our primary goal is to support the DataShaper workflow schema while retaining single-machine ease of use and developer ergonomics.</p>
Our underlying dataframe technology may change over time, but our primary goal is to support the workflow schema while retaining single-machine ease of use and developer ergonomics.</p>
<h3 id="llm-caching">LLM Caching</h3>
<p>The GraphRAG library was designed with LLM interactions in mind, and a common setback when working with LLM APIs is various errors due to network latency, throttling, etc..
Because of these potential error cases, we've added a cache layer around LLM interactions.
Expand Down
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

Binary file modified sitemap.xml.gz
Binary file not shown.

0 comments on commit 5a809a4

Please sign in to comment.