Skip to content

Commit

Permalink
Initial Commit
Browse files Browse the repository at this point in the history
  • Loading branch information
braedon committed Jul 22, 2020
0 parents commit 240f1a0
Show file tree
Hide file tree
Showing 14 changed files with 1,115 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.git
.gitignore
104 changes: 104 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
9 changes: 9 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
language: python
os: linux
dist: bionic
python:
- "3.6"
- "3.7"
- "3.8"
script:
- python -m unittest
21 changes: 21 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM python:3.8-slim

WORKDIR /app

RUN apt-get update \
&& apt-get install -y git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

COPY requirements.txt /app/
RUN pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir -r requirements.txt

COPY *.py /app/
COPY utils/*.py /app/utils/
COPY kong_log_bridge/*.py /app/kong_log_bridge/
COPY README.md /app/

EXPOSE 8080

ENTRYPOINT ["python", "-u", "main.py"]
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2020 Braedon Vickers

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
118 changes: 118 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
Kong Request Log Bridge
====
Transform Kong request logs and forward them to Elasticsearch. Redact request logs for improved privacy and security, and index them directly into Elasticsearch, without the need for complex and heavyweight tools like Logstash.

[Source Code](https://github.com/braedon/kong-log-bridge) | [Docker Image](https://hub.docker.com/r/braedon/kong-log-bridge)

# Usage
The service is distributed as a docker image. Released versions can be found on Docker Hub (note that no `latest` version is provided):

```bash
> sudo docker pull braedon/kong-log-bridge:<version>
```

The docker image exposes a REST API on port `8080`. It is configured by passing options after the image name:
```bash
> sudo docker run --rm --name kong-log-bridge \
-p <host port>:8080 \
braedon/kong-log-bridge:<version> \
-e <elasticsearch node> \
--convert-ts \
--hash-ip \
--hash-auth \
--hash-cookie
```
Run with the `-h` flag to see details on all the available options.

Note that all options can be set via environment variables. The environment variable names are prefixed with `KONG_LOG_BRIDGE_OPT`, e.g. `KONG_LOG_BRIDGE_OPT_CONVERT_TS=true` is equivalent to `--convert-ts`. CLI options take precedence over environment variables.

## Input
Kong JSON request logs can be `POST`ed to the `/logs` endpoint. This is designed for logs to be sent by the [Kong HTTP Log plugin](https://docs.konghq.com/hub/kong-inc/http-log/). See the Kong documentation for details on how to enable and configure the plugin.

This is currently the only supported input method, but more may be added in the future.

## Transformation
Request logs are passed through unchanged by default, but you probably want to enable at least one transformation.

### Timestamp Conversion `--convert-ts`
Kong request logs include a number of UNIX timestamps (some in milliseconds rather than seconds). These are not human readable, and require explicit mappings to be used in Elasticsearch. Enabling this option will convert these timestamps to [RFC3339 date-time strings](https://www.ietf.org/rfc/rfc3339.txt) for readability and automatic Elasticsearch mapping.

Fields converted:
```
- service.created_at
- service.updated_at
- route.created_at
- route.updated_at
- started_at
- tries[].balancer_start
```

### Client IP Hashing `--hash-ip`
This option enables hashing the `client_ip` field to avoid storing sensitive user IP addresses.

### Authorization Hashing `--hash-auth`
This option enables hashing the `credentials` part of the [`Authorization` request header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization) (`request.headers.authorization` field) to avoid storing credentials/tokens.

```
Authorization: Bearer some_secret_token -> Bearer 7ftgstREEBqhHrQNgj6MVA
```

### Cookie Hashing `--hash-cookie`
This option enables hashing the `value` part of the [`Cookie` request header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cookie) (`request.headers.cookie` field) and [`Set-Cookie` response header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie) (`response.headers.set-cookie` field) to avoid storing sensitive cookies.

```
Cookie: some_cookie=some_session -> some_cookie=q1EXmTUdD0Bvm8_jHrQizw
Set-Cookie: some_cookie=some_session; Secure; HttpOnly; SameSite=Lax -> some_cookie=q1EXmTUdD0Bvm8_jHrQizw; Secure; HttpOnly; SameSite=Lax
```

### Field Hashing and Nulling `--hash-path`/`--null-path`
Arbitrary request log fields can be hashed or converted to null by specifying their path with these options. Provide the desired option multiple times to specify multiple paths.

Paths describe how to traverse the JSON structure of the request logs to find a field. They consist of a hierarchy of object fields to traverse from the root JSON object, separated by periods (`.`). The `[]` suffix on a field indicates its value is an array, and should be iterated.

e.g. `--hash-path tries[].ip` will hash the `ip` of every upstream "try" in the `tries` array.

Paths don't need to end at specific value - they can specify an entire object or array.

e.g. `--null-path request.headers` will convert the entire `request.headers` object to null, effectively removing it from the log.

If a path doesn't match any field in a given request log it will be ignored.

## Output
Transformed logs are indexed in Elasticsearch.

This is currently the only supported output method, but more may be added in the future.

### Elasticsearch Nodes `-e`/`--es-node` (required)
The address of at least one Elasticsearch node must be provided via this option. The port should be included if non-standard (`9200`). Provide the option multiple times to specify multiple nodes in a cluster.

### Elasticsearch Index `-es-index`
The Elasticsearch index to send logs to. [Elasticsearch index date math](https://www.elastic.co/guide/en/elasticsearch/reference/current/date-math-index-names.html) can be used. Defaults to `<kong-requests-{now/d}>`.

### Elasticsearch Security
A number of options exist to support Elasticsearch server and client SSL, and basic authentication. See the `-h` output for details.

# Development
To run directly from the git repo, run the following in the root project directory:
```bash
> pip3 install -r requirements.txt
> python3 main.py [OPTIONS]
```
To run tests (as usual, from the root project directory), use:
```bash
> python3 -m unittest
```
Note that these tests currently only cover the log transformation functionality - there are no automated system tests as of yet.

To build a docker image directly from the git repo, run the following in the root project directory:
```bash
> sudo docker build -t <your repository name and tag> .
```

To develop in a docker container, first build the image, and then run the following in the root project directory:
```bash
> sudo docker run --rm -it --name kong-log-bridge --entrypoint bash -v $(pwd):/app <your repository name and tag>
```
This will mount all the files inside the container, so editing tests or application code will be synced live. You can run the tests with `python -m unittest`.

Send me a PR if you have a change you want to contribute!
47 changes: 47 additions & 0 deletions kong_log_bridge/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import json
import simplejson

from bottle import Bottle, abort, request, response

from .transform import transform_log


def json_default_error_handler(http_error):
response.content_type = 'application/json'
return json.dumps({'error': http_error.body}, separators=(',', ':'))


def construct_app(es_client, es_index, **kwargs):
app = Bottle()
app.default_error_handler = json_default_error_handler

@app.get('/status')
def status():
return 'OK'

@app.post('/logs')
def logs():
if request.headers.get('Content-Type') != 'application/json':
abort(415, 'Require "Content-Type: application/json"')

try:
log = request.json
except simplejson.JSONDecodeError:
abort(400, 'POST data is not valid JSON')

if not isinstance(log, dict):
abort(400, 'POST body must be a JSON object')

log = transform_log(log,
do_convert_ts=kwargs['convert_ts'],
do_hash_ip=kwargs['hash_ip'],
do_hash_auth=kwargs['hash_auth'],
do_hash_cookie=kwargs['hash_cookie'],
hash_paths=kwargs['hash_path'],
null_paths=kwargs['null_path'])

es_client.index(index=es_index, body=log, request_timeout=30)

response.status = 204

return app
Loading

0 comments on commit 240f1a0

Please sign in to comment.