Skip to content

Commit

Permalink
Readme update ssh (aws#8)
Browse files Browse the repository at this point in the history
Description of changes:

- Added instructions on how to create a new EC2 instance and set it up in the same VPC as Neptune to be used as a bastion server
- Added instructions for setting up an SSH tunnel on Windows

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
  • Loading branch information
joywa authored Nov 10, 2020
1 parent 00a0258 commit e93f281
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 6 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
## graph-notebook

Python package integrating jupyter notebooks with various graph-stores including
Python package integrating Jupyter notebooks with various graph-stores including
[Apache TinkerPop](https://tinkerpop.apache.org/) and [RDF SPARQL](https://www.w3.org/TR/rdf-sparql-query/).

## Requirements
- Python3.6
- Jupyter Notebooks
- Python 3.6
- Jupyter Notebook

## Introduction
The graph-notebook provides a way to interact using a Jupyter notebook with any graph database that follows the Gremlin Server or RDF HTTP protocols. These databases could be running locally on your laptop, in a private data center or in the cloud. This project was initially created as a way to work with Amazon Neptune but is not limited to that database engine. For example you can connect to a Gremlin Server running on your laptop using this solution. The instructions below describe the process for connecting to Amazon Neptune. We encourage others to contribute configurations they find useful. There is an `additional-databases` folder where such information can be found. We have already provided instructions for establishing the Gremlin Server connection.
The graph-notebook provides a way to interact using a Jupyter notebook with any graph database that follows the Gremlin Server or RDF HTTP protocols. These databases could be running locally on your laptop, in a private data center or in the cloud. This project was initially created as a way to work with Amazon Neptune but is not limited to that database engine. For example you can connect to a Gremlin Server running on your laptop using this solution. The instructions below describe the process for connecting to Amazon Neptune. We encourage others to contribute configurations they find useful. There is an [`additional-databases`](additional-databases) folder where such information can be found.

## Installation

Expand All @@ -35,7 +35,7 @@ jupyter notebook /notebook/destination/dir

In order to connect to your graph database, you have three configuration options.

1. Change the host setting in your opened jupyter notebook by running the following in a notebook cell:
1. Change the host setting in your opened Jupyter notebook by running the following in a notebook cell:

```
%graph_notebook_host you-endpoint-here
Expand Down Expand Up @@ -85,7 +85,8 @@ echo "{
```

### Connecting to a local graph store
As mentioned in the introduction, it is possible to connect `graph-notebook` to a graph database running on your local machine. An example being Gremlin Server. There are additional instructions regarding the use of local servers in the `additional-databases` folder.
As mentioned in the introduction, it is possible to connect [`graph-notebook`](src/graph_notebook) to a graph database running on your local machine, an example being Gremlin Server. There are additional instructions regarding the use of local servers in the [`additional-databases`](additional-databases) folder.


## Authentication

Expand Down
56 changes: 56 additions & 0 deletions additional-databases/neptune/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
## Connecting a local graph-notebook to Amazon Neptune (first-time setup)
When using graph-notebook locally to connect to an Amazon Neptune database for the first time, there are a couple of additional steps. This section assumes that you've already installed & configured [graph-notebook](https://github.com/aws/graph-notebook#installation) locally. Please note that this wiki is not an official recommendation on network setups as there are many ways to connect to Amazon Neptune from outside of the VPC, such as setting up a load balancer or VPC peering.

Amazon Neptune DB clusters can only be created in an Amazon Virtual Private Cloud (VPC). One way to connect to Amazon Neptune from outside of the VPC is to set up an Amazon EC2 instance as a proxy server within the same VPC. With this approach, you will also want to set up an SSH tunnel to securely forward traffic to the VPC.

### Part 1: Set up a EC2 proxy server.
Launch an [Amazon EC2](https://aws.amazon.com/ec2/) instance located in the same region as your Neptune cluster. In terms of configuration, a standard Amazon Linux AMI can be used. Since this is a proxy server, you can choose the lowest resource settings.

Make sure the EC2 instance is in the same VPC group as your Neptune cluster. To find the VPC group for your Neptune cluster, check the console under [Neptune](https://console.aws.amazon.com/neptune/home) > Subnet groups. The instance's security group needs to be able to send and receive on port 22 for SSH and port 8182 for Neptune. See below for an example security group setup.

![Sample EC2 Inbound Rules](/././images/sample-ec2rules.png)

Lastly, make sure you save the key-pair file (.pem) and note the directory for use in the next step.

### Part 2: Set up an SSH tunnel.
This step can vary depending on if you are running Windows or Mac.

<b>Windows</b>

First, modify your hosts file as an Administrator (C:\Windows\System32\drivers\etc\hosts) to map localhost to your Neptune endpoint:

127.0.0.1 localhost your-Neptune-endpoint-here

Next, open Command Prompt as an Administrator and navigate to the directory where you saved the EC2 key-pair file. Run the following command:

`ssh -i keypairfilename.pem ec2-user@yourec2instanceendpoint -N -L 8182:yourneptuneendpoint:8182`

The -N flag will log you in instead of prompting for the information already included as part of your command when logging into EC2. An initial successful connection will ask you if you want to continue connecting? Type yes and enter.

To test the success of your local graph-notebook connection to Amazon Neptune, open a browser and navigate to:

`https://yourneptunendpoint:8182/status`

You should see a report, similar to the one below, indicating the status and details of your specific cluster:

```
{
"status": "healthy",
"startTime": "Wed Nov 04 23:24:44 UTC 2020",
"dbEngineVersion": "1.0.3.0.R1",
"role": "writer",
"gremlin": {
"version": "tinkerpop-3.4.3"
},
"sparql": {
"version": "sparql-1.1"
},
"labMode": {
"ObjectIndex": "disabled",
"DFEQueryEngine": "disabled",
"ReadWriteConflictDetection": "enabled"
}
}
```

Now, you should be able to run queries from your local Jupyter graph notebook to your Neptune clusters! When you're ready to close the connection, use Ctrl+D to exit.
Binary file added images/sample-ec2rules.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e93f281

Please sign in to comment.