Skip to content

Commit

Permalink
openCypher Query and Visualization Support (aws#153)
Browse files Browse the repository at this point in the history
* copy from github v2.0.3

* port dbechbe@ working version, add syntax highlighting, add %%oc which calls into %%opencypher

* opencypher iam auth

* bolt support

* rebase from github, refactor opencypher

* get tests working for opencypher endpoint

* Updated data processing code to make edge ids static so that the load is idempotent

* rebase from 2.1.2

* - port dbechbe@ working version, add syntax highlighting, add %%oc which calls into %%opencypher
- get tests working for opencypher endpoint
- pull in client and add OC methods

* Initial versions of notebooks updated for Neptune GA

* Updated Neptune ML notebooks, utils, and pretrained models config

* add support for modeltransform commands in %neptune_ml

* Updated OC widget to handle new JSON format

* Updated ML notebooks with feedback from Annupriya

* Added back in missing init file for the Gremlin Network

* Added support for openCypher syntax highlighting

* Added missing init files and updated files that incorrectly referenced SPARQL instead of OC

* WIP - Adding visualization to OC

* WIP - Intiial rough visualization of OC results

* WIP - updated to handle group vars passed in as json

* Rebase on v2.1.3 and changes due for v2.1.4

* Resolve remaining merge conflicts from v2.1.2 rebase

* Added comments and cleaned up code for initial OC visualization

* Revert unintended changes to Gremlin tests

* WIP - Adding visualization to OC

* Cleaned up merge conflicts after merge from akline/OC

* Fixed additional merge conflicts

* Finally fixed merge conflicts from akline/OC branch

* Copied code to set label display and label length

* Fix Sparql tab widgets being displayed incorrectly, some PEP8 fixes

* Changed the seed command to use 'Property Graph/RDF' as the data models instead of 'Gremlin/SPARQL' in order to support OC release

* Removed tmp file used for building

* Added opencypher support for bulk load

* Cleaned up last few merge conflicts

* PEP8 fixes

* More PEP8 fixes

* Update notebooks unit test with the new notebook paths

* Fixed issue with seed command as well as default grouping not working correctly

* Fixed issue where parsed lists of dictionaries were not remaining ordered

* Updated Notebooks to refer to new seed command

* Add '-de' param to Gremlin magic for specifying edge labels

* Fix bug in adding dict type edges to graph, rearrange recent tests

* Initial upload of new notebooks

* Additional cleanup/tweaks

* Add variable injection decorator to OC magics

* Fixed casing on seed command labels for consistency

* Introduce new features via text and tweak examples

* Add --edge-display-property to OC magic for specifying edge labels

* Update OC notebooks hints sections with -de param

* Additional improvements to intro section

* Additional examples and prose

* Initial updates for the README - more needed

* Additional README updates - more needed

* Update URL for openCypher

* Initial upload of sample OC images

* Add link to OC sample image

* Add another example using the -d hint

* Update Gremlin sample image to show color

* Tweak examples to use more color

* Add a colorful graph image to the README

* Additional pointers to notebooks

* Additional updates

* Fix bug where Gremlin node tooltips were not being changed when using the -d option

* Additional examples that showcase new features

* Rename some variables

* More variable renaming

* Additional small improvements

* Add an example showing how to sample airports

* Minor tweak to random sample example

* Making use if verbs consistent

* Verb consistency and clean up graphics reset

* Fix incorrect option

* Verb consistency

* Improved a couple of examples

* Add visualization support for elementMap() Gremlin step

* Remove Direction.BOTH check

* Remove merged redundancies

* Updated ML notebooks based on feedback from Ankit

* Additional discussion of elementMap usage

* Update Visualization-Grouping-Coloring-Gremlin notebook with elementMap

* [lakelvin@] Refactor %load form display to fix some descriptions being cut off

* Rename Gremlin Grouping-Coloring sample notebook

* Minor changes and rename files

* Add examples of -d and -de without a map

* Fix typo

* Fix -d option not working in OC queries for string format values

* Clean up debug statement

* Fix OC metadata results count metric

* Update ChangeLog for OC Release

* Add ML updates to ChangeLog

* Remove identity graph seed files

* Remove extra chars from notebooks

* Pin neo4j version

* Styling fixes

* More styling fixes

* Update notebook directory validation unit test

Co-authored-by: Austin Kline <[email protected]>
Co-authored-by: Dave Bechberger <[email protected]>
Co-authored-by: Michael Chin <[email protected]>
Co-authored-by: Kelvin Lawrence <[email protected]>
  • Loading branch information
5 people authored Jul 28, 2021
1 parent 0a9d24b commit 3731e50
Show file tree
Hide file tree
Showing 83 changed files with 8,565 additions and 983 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,7 @@ src/graph_notebook/widgets/lib/
# npm
node_modules/
node_modules/.package-lock.json
src/graph_notebook/widgets/package-lock.json
src/graph_notebook/widgets/package-lock.json
blazegraph.jnl
rules.log
*.env
69 changes: 45 additions & 24 deletions ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,47 +3,68 @@
Starting with v1.31.6, this file will contain a record of major features and updates made in each release of graph-notebook.

## Upcoming
- Add visualization support for elementMap Gremlin step ([Link to PR](https://github.com/aws/graph-notebook/pull/140))
- Support additional customization of edge node labels in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/132))
- Include index operations metrics in metadata results tab for Gremlin Profile queries([Link to PR](https://github.com/aws/graph-notebook/pull/150))
- Update SPARQL EPL seed dataset file ([Link to PR](https://github.com/aws/graph-notebook/pull/134))
- Update documentation on using `%%graph_notebook_config` with an IAM enabled Neptune cluster ([Link to PR](https://github.com/aws/graph-notebook/pull/136))
- Fix improper handling of Blazegraph status response ([Link to PR](https://github.com/aws/graph-notebook/pull/137))
- Fix Gremlin node tooltips being displayed incorrectly ([Link to PR](https://github.com/aws/graph-notebook/pull/139))
- Fix bug in using Gremlin explain/profile with large result sets ([Link to PR](https://github.com/aws/graph-notebook/pull/141))
- Pin RDFLib version ([Link to PR](https://github.com/aws/graph-notebook/pull/151))

**openCypher Support**:

With the release of support for the openCypher query language in Amazon Neptune's lab mode, graph-notebook can now be used to execute and visualize openCypher queries with any compatible graph database.

Two new magic commands have been added:
- `%%oc`/`%%opencypher`
- `%%oc_status`/`%%opencypher_status`

These openCypher magic commands inherit the majority of the query and visualization customization features that are already available in the Gremlin and SPARQL magics.

For more detailed information and examples of how you can execute and visualize openCypher queries through graph-notebook, please refer to the new `Air-Routes-openCypher` and `EPL-openCypher` sample notebooks.

**Other major updates**:
- Added visualization support for elementMap Gremlin step ([Link to PR](https://github.com/aws/graph-notebook/pull/140))
- Added support for additional customization of edge node labels in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/132))
- Refactored %load form display code for flexibility; fixes some descriptions being cut off
- Updated Neptune ML notebooks, utils, and pretrained models config
- Added support for `modeltransform` commands in `%neptune_ml`

**Minor updates**:
- Included index operations metrics in metadata results tab for Gremlin Profile queries([Link to PR](https://github.com/aws/graph-notebook/pull/150))
- Updated SPARQL EPL seed dataset file ([Link to PR](https://github.com/aws/graph-notebook/pull/134))
- Updated documentation on using `%%graph_notebook_config` with an IAM enabled Neptune cluster ([Link to PR](https://github.com/aws/graph-notebook/pull/136))

**Bugfixes**:
- Fixed improper handling of Blazegraph status response ([Link to PR](https://github.com/aws/graph-notebook/pull/137))
- Fixed Gremlin node tooltips being displayed incorrectly ([Link to PR](https://github.com/aws/graph-notebook/pull/139))
- Fixed bug in using Gremlin explain/profile with large result sets ([Link to PR](https://github.com/aws/graph-notebook/pull/141))
- Pinned RDFLib version ([Link to PR](https://github.com/aws/graph-notebook/pull/151))

## Release 2.1.4 (June 27, 2021)
- Support for additional customization of graph node labels in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/127))
- Added support for additional customization of graph node labels in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/127))

## Release 2.1.3 (June 18, 2021)
- Support dictionary value access in variable injection([Link to PR](https://github.com/aws/graph-notebook/pull/126))
- Added support for dictionary value access in variable injection([Link to PR](https://github.com/aws/graph-notebook/pull/126))

## Release 2.1.2 (May 10, 2021)

- Pin gremlinpython to `<3.5.*` ([Link to PR](https://github.com/aws/graph-notebook/pull/123))
- Add support for notebook variables in Sparql/Gremlin magic queries ([Link to PR](https://github.com/aws/graph-notebook/pull/113))
- Add support for grouping by different properties per label in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/115))
- Fix missing Boto3 dependency in setup.py ([Link to PR](https://github.com/aws/graph-notebook/pull/118))
- Update %load execution time to HH:MM:SS format if over a minute ([Link to PR](https://github.com/aws/graph-notebook/pull/121))
- Pinned gremlinpython to `<3.5.*` ([Link to PR](https://github.com/aws/graph-notebook/pull/123))
- Added support for notebook variables in Sparql/Gremlin magic queries ([Link to PR](https://github.com/aws/graph-notebook/pull/113))
- Added support for grouping by different properties per label in Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/115))
- Fixed missing Boto3 dependency in setup.py ([Link to PR](https://github.com/aws/graph-notebook/pull/118))
- Updated %load execution time to HH:MM:SS format if over a minute ([Link to PR](https://github.com/aws/graph-notebook/pull/121))

## Release 2.1.1 (April 22, 2021)

- Fix bug in `%neptune_ml export ...` logic where the iam setting for the exporter endpoint wasn't getting picked up properly
- Fixed bug in `%neptune_ml export ...` logic where the iam setting for the exporter endpoint wasn't getting picked up properly

## Release 2.1.0 (April 15, 2021)

- Add support for Mode, queueRequest, and Dependencies parameters when running %load command ([Link to PR](https://github.com/aws/graph-notebook/pull/91))
- Add support for list and dict as map keys in Python Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/100))
- Refactor modules that call to Neptune or other SPARQL/Gremlin endpoints to use a unified client object ([Link to PR](https://github.com/aws/graph-notebook/pull/104))
- Added support for Mode, queueRequest, and Dependencies parameters when running %load command ([Link to PR](https://github.com/aws/graph-notebook/pull/91))
- Added support for list and dict as map keys in Python Gremlin ([Link to PR](https://github.com/aws/graph-notebook/pull/100))
- Refactored modules that call to Neptune or other SPARQL/Gremlin endpoints to use a unified client object ([Link to PR](https://github.com/aws/graph-notebook/pull/104))
- Added an additional notebook under [02-Visualization](src/graph_notebook/notebooks/02-Visualization) demonstrating how to use the visualzation grouping and coloring options in Gremlin. ([Link to PR](https://github.com/aws/graph-notebook/pull/107))
- Add metadata output tab for magic queries ([Link to PR](https://github.com/aws/graph-notebook/pull/108))
- Added metadata output tab for magic queries ([Link to PR](https://github.com/aws/graph-notebook/pull/108))

## Release 2.0.12 (Mar 25, 2021)

- Add default parameters for `get_load_status` ([Link to PR](https://github.com/aws/graph-notebook/pull/96))
- Add ipython as a dependency in `setup.py` ([Link to PR](https://github.com/aws/graph-notebook/pull/95))
- Add parameters in `load_status` for `details`, `errors`, `page`, and `errorsPerPage` ([Link to PR](https://github.com/aws/graph-notebook/pull/88))
- Added default parameters for `get_load_status` ([Link to PR](https://github.com/aws/graph-notebook/pull/96))
- Added ipython as a dependency in `setup.py` ([Link to PR](https://github.com/aws/graph-notebook/pull/95))
- Added parameters in `load_status` for `details`, `errors`, `page`, and `errorsPerPage` ([Link to PR](https://github.com/aws/graph-notebook/pull/88))

## Release 2.0.10 (Mar 18, 2021)

Expand Down
37 changes: 27 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,26 @@
## Graph Notebook: easily query and visualize graphs
## Graph Notebook: easily query and visualize graphs

The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Using this open-source Python package, you can connect to any graph database that supports the [Apache TinkerPop](https://tinkerpop.apache.org/), [openCypher](https://github.com/opencypher/openCypher) or the [RDF SPARQL](https://www.w3.org/TR/rdf-sparql-query/) graph models. These databases could be running locally on your desktop or in the cloud. Graph databases can be used to explore a variety of use cases including [knowledge graphs](https://aws.amazon.com/neptune/knowledge-graphs-on-aws/) and [identity graphs](https://aws.amazon.com/neptune/identity-graphs-on-aws/).

![A colorful graph picture](./images/ColorfulGraph.png)

The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Using this open-source Python package, you can connect to any graph database that supports the [Apache TinkerPop](https://tinkerpop.apache.org/) or the [RDF SPARQL](https://www.w3.org/TR/rdf-sparql-query/) graph model. These databases could be running locally on your desktop or in the cloud. Graph databases can be used to explore a variety of use cases including [knowledge graphs](https://aws.amazon.com/neptune/knowledge-graphs-on-aws/) and [identity graphs](https://aws.amazon.com/neptune/identity-graphs-on-aws/).

### Visualizing Gremlin queries:

![Gremlin query and graph](./images/GremlinQueryGraph.png)

### Visualizing openCypher queries

![openCypher query and graph](./images/OCQueryGraph.png)

### Visualizing SPARQL queries:

![SPARL query and graph](./images/SPARQLQueryGraph.png)

Instructions for connecting to the following graph databases:

| Endpoint | Graph model | Query language |
| :-----------------------------: | :---------------------: | :-----------------: |
| :-----------------------------: | :---------------------: | :-----------------: |
|[Gremlin Server](#gremlin-server)| property graph | Gremlin |
| [Blazegraph](#blazegraph) | RDF | SPARQL |
|[Amazon Neptune](#amazon-neptune)| property graph or RDF | Gremlin or SPARQL |
Expand All @@ -25,7 +32,9 @@ We encourage others to contribute configurations they find useful. There is an [
#### Notebook cell 'magic' extensions in the IPython 3 kernel
`%%sparql` - Executes a SPARQL query against your configured database endpoint.

`%%gremlin` - Executes a Gremlin query against your database using web sockets. The results are similar to what the Gremlin console would return.
`%%gremlin` - Executes a Gremlin query against your database using web sockets. The results are similar to those a Gremlin console would return.

`%%opencypher` or `%%oc` Executes an openCypher query against your database.

`%%graph_notebook_config` - Sets the executing notebook's database configuration to the JSON payload provided in the cell body.

Expand All @@ -41,18 +50,20 @@ We encourage others to contribute configurations they find useful. There is an [

`%sparql_status` - Obtain the status of SPARQL queries. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/sparql-api-status.html)

`%opencypher_status` or `%oc_status` - Obtain the status of openCypher queries.

`%load` - Generate a form to submit a bulk loader job. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html)

`%load_ids` - Get ids of bulk load jobs. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-status-examples.html)

`%load_status` - Get the status of a provided `load_id`. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-status-examples.html)

`%neptune_ml` - Set of commands to integrate with NeptuneML functionality. You can find a set of tutorial notebooks [here](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/notebooks/04-Machine-Learning).
`%neptune_ml` - Set of commands to integrate with NeptuneML functionality. You can find a set of tutorial notebooks [here](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/notebooks/04-Machine-Learning).
[Documentation](https://aws.amazon.com/neptune/machine-learning/)

`%status` - Check the Health Status of the configured host endpoint. [Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-status.html)

`%seed` - Provides a form to add data to your graph without the use of a bulk loader. both SPARQL and Gremlin have an airport routes dataset.
`%seed` - Provides a form to add data to your graph without the use of a bulk loader. Supports both RDF and Property Graph data models.

`%graph_notebook_config` - Returns a JSON payload that contains connection information for your host.

Expand All @@ -64,6 +75,13 @@ We encourage others to contribute configurations they find useful. There is an [

**TIP** :point_right: You can list all the magics installed in the Python 3 kernel using the `%lsmagic` command.

**TIP** :point_right: Many of the magic commands support a `--help` option in order to provide additional information.

## Example notebooks
This project includes many example Jupyter notebooks. It is recommended to explore them. All of the commands and features supported by `graph-notebook` are explained in detail with examples within the sample notebooks. You can find them [here](./src/graph_notebook/notebooks/). As this project has evolved, many new features have been added. If you are already familiar with graph-notebook but want a quick summary of new features added, a good place to start is the Air-Routes notebooks in the [02-Visualization](./src/graph_notebook/notebooks/02-Visualization) folder.

## Keeping track of new features
It is recommended to check the [ChangeLog.md](ChangeLog.md) file periodically to keep up to date as new features are added.

## Prerequisites

Expand All @@ -74,7 +92,6 @@ You will need:
* [Tornado](https://pypi.org/project/tornado/) 4.5.3
* A graph database that provides a SPARQL 1.1 Endpoint or a Gremlin Server


## Installation

```
Expand Down Expand Up @@ -102,7 +119,7 @@ jupyter notebook ~/notebook/destination/dir

## Connecting to a graph database

### Gremlin Server
### Gremlin Server

In a new cell in the Jupyter notebook, change the configuration using `%%graph_notebook_config` and modify the fields for `host`, `port`, and `ssl`. For a local Gremlin server (HTTP or WebSockets), you can use the following command:

Expand Down Expand Up @@ -154,7 +171,7 @@ You can also make use of namespaces for Blazegraph by specifying the path `graph
}
```

This will result in the url `localhost:9999/blazegraph/namespace/foo/sparql` being used when executing any `%%sparql` magic commands.
This will result in the url `localhost:9999/blazegraph/namespace/foo/sparql` being used when executing any `%%sparql` magic commands.

To setup a new local Blazegraph database for use with the graph notebook, check out the [Quick Start](https://github.com/blazegraph/database/wiki/Quick_Start) from Blazegraph.

Expand All @@ -175,7 +192,7 @@ Change the configuration using `%%graph_notebook_config` and modify the defaults
```
To setup a new Amazon Neptune cluster, check out the [AWS documentation](https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-launch.html).

When connecting the graph notebook to Neptune, make sure you have a network setup to communicate to the VPC that Neptune runs on. If not, you can follow [this guide](https://github.com/aws/graph-notebook/tree/main/additional-databases/neptune).
When connecting the graph notebook to Neptune, make sure you have a network setup to communicate to the VPC that Neptune runs on. If not, you can follow [this guide](https://github.com/aws/graph-notebook/tree/main/additional-databases/neptune).

## Authentication (Amazon Neptune)

Expand Down
18 changes: 18 additions & 0 deletions additional-databases/blazegraph/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Connecting graph notebook to Blazegraph SPARQL Endpoint

The official SPARQL endpoint for DBPedia is available from https://dbpedia.org/sparql and is based on a Virtuoso engine.

It is possible to connect to this endpoint using the following configuration:

```
%%graph_notebook_config
{
"host": "dbpedia.org",
"port": 443,
"auth_mode": "DEFAULT",
"iam_credentials_provider_type": "ROLE",
"load_from_s3_arn": "",
"ssl": true,
"aws_region": ""
}
```
Binary file added images/ColorfulGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/GremlinQueryGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/OCQueryGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 7 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,15 @@ notebook==5.7.10
ipywidgets==7.5.1
jupyter-contrib-nbextensions
widgetsnbextension
gremlinpython
gremlinpython<=3.4.*
requests==2.24.0
ipython==7.16.1
neo4j==4.2.1
rdflib~=5.0.0
traitlets~=4.3.3
setuptools~=40.6.2

# requirements for testing
boto3==1.15.15
botocore==1.18.18
botocore~=1.18.18
boto3~=1.15.15
pytest==6.2.2
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ def get_version():
'botocore>=1.19.37',
'boto3>=1.17.58',
'ipython>=7.16.1',
'neo4j==4.3.2',
'rdflib==5.0.0'
],
package_data={
Expand Down
3 changes: 2 additions & 1 deletion src/graph_notebook/magics/completers/graph_completer.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
'GRAPH',
'FILTER',
'ASK',
'DESCRIBE']
'DESCRIBE',
'UNLOAD']
GREMLIN_OPTIONS = [
'.toString',
'.tx',
Expand Down
Loading

0 comments on commit 3731e50

Please sign in to comment.