Skip to content

Commit

Permalink
Readmev2 (aws#21)
Browse files Browse the repository at this point in the history
* Gremlin and SPARQL images

* Updated README

new instructions for Blazegraph, Gremlin Server, and Neptune
new images showing graph-notebook features

* Update README.md

* Update README.md

* Update README.md

* Update README.md
  • Loading branch information
joywa authored Nov 19, 2020
1 parent 97324b0 commit 23775ba
Show file tree
Hide file tree
Showing 3 changed files with 90 additions and 43 deletions.
133 changes: 90 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,65 @@
## graph-notebook
## Graph Notebook: easily query and visualize graphs

Python package integrating Jupyter notebooks with various graph-stores including
[Apache TinkerPop](https://tinkerpop.apache.org/) and [RDF SPARQL](https://www.w3.org/TR/rdf-sparql-query/).
The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Using this open-source Python package, you can connect to any graph database that supports the [Apache TinkerPop](https://tinkerpop.apache.org/) or the [RDF SPARQL](https://www.w3.org/TR/rdf-sparql-query/) graph model. These databases could be running locally on your desktop or in the cloud. Graph databases can be used to explore a variety of use cases including [knowledge graphs](https://aws.amazon.com/neptune/knowledge-graphs-on-aws/) and [identity graphs](https://aws.amazon.com/neptune/identity-graphs-on-aws/).

## Requirements
### Visualizing Gremlin queries:

- Python 3.6.1 or higher, Python 3.7
- Jupyter Notebook
![Gremlin query and graph](./images/GremlinQueryGraph.png)

## Introduction
The graph-notebook provides a way to interact using a Jupyter notebook with any graph database that follows the Gremlin Server or RDF HTTP protocols. These databases could be running locally on your laptop, in a private data center or in the cloud. This project was initially created as a way to work with Amazon Neptune but is not limited to that database engine. For example you can connect to a Gremlin Server running on your laptop using this solution. The instructions below describe the process for connecting to Amazon Neptune. We encourage others to contribute configurations they find useful. There is an [`additional-databases`](additional-databases) folder where such information can be found.
### Visualizing SPARQL queries:

![SPARL query and graph](./images/SPARQLQueryGraph.png)

Instructions for connecting to the following graph databases:

| Endpoint | Graph model | Query language |
| :-----------------------------: | :---------------------: | :-----------------: |
|[Gremlin Server](#gremlin-server)| property graph | Gremlin |
| [Blazegraph](#blazegraph) | RDF | SPARQL |
|[Amazon Neptune](#amazon-neptune)| property graph or RDF | Gremlin or SPARQL |

We encourage others to contribute configurations they find useful. There is an [`additional-databases`](https://github.com/aws/graph-notebook/blob/main/additional-databases) folder where more information can be found.

## Features

#### Notebook cell 'magic' extensions in the IPython 3 kernel
`%%sparql` - Executes a SPARQL query against your configured database endpoint.

`%%gremlin` - Executes a Gremlin query against your database using web sockets. The results are similar to what the Gremlin console would return.

**TIP** :point_right: There is syntax highlighting for both `%%sparql` and `%%gremlin` queries to help you structure your queries more easily.

#### Notebook line 'magic' extensions in the IPython 3 kernel
`%graph_notebook_config` - Returns a JSON object that contains connection information for your host.

`%query_mode` - Lets you set the query mode for your queries to one of:

* `query` (the default) : executes the query against the normal SPARQL or Gremlin endpoint
* `explain` : Returns an explanation of the query plan instead of the query's results (valid for both SPARQL and Gremlin).
* `profile` : Returns a profile of the query's operation, but does not actually execute the query (valid only for Gremlin).

`%seed` - Provides a form to add data to your graph without the use of a bulk loader. both SPARQL and Gremlin have an airport routes dataset.

**TIP** :point_right: You can list all the magics installed in the Python 3 kernel using the `%lsmagic` command.


## Prerequisites

You will need:

* [Python](https://www.python.org/downloads/) 3.6.1-3.6.12
* [Jupyter Notebook](https://jupyter.org/install) 5.7.10
* [Tornado](https://pypi.org/project/tornado/) 4.5.3
* A graph database that provides a SPARQL 1.1 Endpoint or a Gremlin Server


## Installation

```
# pin specific versions of Jupyter and Tornado dependency
pip install notebook==5.7.10
pip install tornado==4.5.3
# install the package
pip install graph-notebook
Expand All @@ -27,70 +72,72 @@ python -m graph_notebook.static_resources.install
python -m graph_notebook.nbextensions.install
# copy premade starter notebooks
python -m graph_notebook.notebooks.install --destination /notebook/destination/dir
python -m graph_notebook.notebooks.install --destination ~/notebook/destination/dir
# start jupyter
jupyter notebook /notebook/destination/dir
jupyter notebook ~/notebook/destination/dir
```

## Configuration
## Connecting to a graph database

In order to connect to your graph database, you have three configuration options.
### Gremlin Server

1. Change the host setting in your opened Jupyter notebook by running the following in a notebook cell:
In a new cell in the Jupyter notebook, change the configuration using `%%graph_notebook_config` and modify the fields for `host`, `port`, and `ssl`. For a local Gremlin server (HTTP or WebSockets), you can use the following command:

```
%graph_notebook_host you-endpoint-here
%%graph_notebook_config
{
"host": "localhost",
"port": 8182,
"auth_mode": "DEFAULT",
"iam_credentials_provider_type": "ROLE",
"load_from_s3_arn": "",
"ssl": false,
"aws_region": "us-east-1"
}
```

2. Change your configuration entirely grabbing the current configuration, making edits, and saving it to your notebook by running the following cells:
To setup a new local Gremlin Server for use with the graph notebook, check out [`additional-databases/gremlin server`](additional-databases/gremlin-server)

```
# 1. print your configuration
%graph_notebook_config
### Blazegraph

# default config will be printed if nothing else is set:
{
"host": "change-me",
"port": 8182,
"auth_mode": "DEFAULT",
"iam_credentials_provider_type": "ROLE",
"load_from_s3_arn": "",
"ssl": true,
"aws_region": "us-east-1"
}
Change the configuration using `%%graph_notebook_config` and modify the fields for `host`, `port`, and `ssl`. For a local Blazegraph database, you can use the following command:

# 2. in a new cell, change the configuration by using %%graph_notebook_config (note the two leading %% instead of one)
```
%%graph_notebook_config
{
"host": "changed-my-endpoint",
"port": 8182,
"host": "localhost",
"port": 9999,
"auth_mode": "DEFAULT",
"iam_credentials_provider_type": "ENV",
"iam_credentials_provider_type": "ROLE",
"load_from_s3_arn": "",
"ssl": true,
"ssl": false,
"aws_region": "us-east-1"
}
```
To setup a new local Blazegraph database for use with the graph notebook, check out the [Quick Start](https://github.com/blazegraph/database/wiki/Quick_Start) from Blazegraph.

### Amazon Neptune

Change the configuration using `%%graph_notebook_config` and modify the defaults as they apply to your Neptune cluster:

3. Store a configuration under ~/graph_notebook_config.json
```
echo "{
"host": "changed-my-endpoint",
%%graph_notebook_config
{
"host": "your-neptune-endpoint",
"port": 8182,
"auth_mode": "DEFAULT",
"iam_credentials_provider_type": "ENV",
"load_from_s3_arn": "",
"ssl": true,
"aws_region": "us-east-1"
}" >> ~/graph_notebook_config.json
"aws_region": "your-neptune-region"
}
```
To setup a new Amazon Neptune cluster, check out the [AWS documentation](https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-launch.html).

### Connecting to a local graph store
As mentioned in the introduction, it is possible to connect [`graph-notebook`](src/graph_notebook) to a graph database running on your local machine, an example being Gremlin Server. There are additional instructions regarding the use of local servers in the [`additional-databases`](additional-databases) folder.

When connecting the graph notebook to Neptune, make sure you have a network setup to communicate to the VPC that Neptune runs on. If not, you can follow [this guide](https://github.com/aws/graph-notebook/tree/main/additional-databases/neptune).

## Authentication
## Authentication (Amazon Neptune)

If you are running a SigV4 authenticated endpoint, ensure that the config field `iam_credentials_provider_type` is set
to `ENV` and that you have set the following environment variables:
Expand All @@ -101,7 +148,7 @@ to `ENV` and that you have set the following environment variables:
- AWS_SESSION_TOKEN (OPTIONAL. Use if you are using temporary credentials)


## Security
## Contributing Guidelines

See [CONTRIBUTING](https://github.com/aws/graph-notebook/blob/main/CONTRIBUTING.md) for more information.

Expand Down
Binary file added images/GremlinQueryGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/SPARQLQueryGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 23775ba

Please sign in to comment.