Skip to content

Latest commit

 

History

History
153 lines (116 loc) · 8.24 KB

README.md

File metadata and controls

153 lines (116 loc) · 8.24 KB

Data Highway

Start using

Maven Central GitHub license Build Coverage Status

Overview

What is Data Highway?

The Data Highway is a service that allows data to be easily produced and consumed via JSON messages over HTTPS/WSS. Data is first defined using a schema and a "road" is created which will accept messages that conform to this schema. Producers of data sets thus only need to define the structure of their data and are then able to send their data to a REST endpoint and not be concerned with what happens next. Data Highway will ensure that this data is made available for streaming consumption and also stored reliably in a "data lake" in the cloud for access by end users.

Architecture

Data Highway Architecture

Paver

Paver is Data Highway's administration endpoint. It provides the following features:

  • Road (Synonymous with Kafka topic) creation.
  • Schema registration and (soft) deletion.
  • Data-at-rest to Hive/S3 configuration.
  • Road-level producer and consumer authorisation.
Onramp

Onramp is Data Highway's producer endpoint. It allows users to submit messages to roads in JSON format over HTTPS.

Offramp

Offramp is Data Highway's consumer endpoint. It allows users to consume message from roads in JSON format over WSS.

Tollbooth

Tollbooth is the core of Data Highway. It provides the mechanism by which mutations to a road's model are persisted. Mutations can come from users (Paver) or internal agents. Anything wishing to make a mutation submit's a JSON Patch onto a deltas Kafka topic. Tollbooth consumes this topic, continuously applying patches to models and persisting them back onto the main Model (compact) topic.

Traffic Control

Traffic Control is the Kafka Agent. It is primarily responsible for managing Kafka topics in response to changes in models.

Loading Bay / Truck Park

Loading Bay is responsible for orchestrating the landing of data to S3 on a configured interval and managing Hive tables - creation, schema mutation and the addition of partitions.

Try it out

Try Test Drive, an in-memory version of Data Highway that exposes all the public facing endpoints in a single Spring Boot application or Docker container.

docker run -p 8080:8080 hotelsdotcom/road-test-drive:<tag>

Examples

Using a local instance of Test Drive, try creating road, registering a schema and producing and consuming messages using the build in user account user:pass.

Note: For the example below, cURL will prompt for a password which is pass.

Create a road

curl -sk \
  -u user \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
  "name": "my_road", 
  "description": "My Road",
  "teamName": "TEAM", 
  "contactEmail": "[email protected]",
  "partitionPath": "$.foo",
  "enabled": true,
  "authorisation": {
    "onramp": {
      "cidrBlocks": ["0.0.0.0/0"],
      "authorities": ["*"]
    },
    "offramp": {
      "authorities": {
        "*": ["PUBLIC"]
      }
    }
  }
}' https://localhost:8080/paver/v1/roads

Register a schema

curl -sk \
  -u user\
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
  "type" : "record",
  "name" : "my_record",
  "fields" : [
    {"name":"foo","type":"string"},
    {"name":"bar","type":"string"}
  ]
}' https://localhost:8080/paver/v1/roads/my_road/schemas

Produce messages

curl -sk \
  -u user\
  -H "Content-Type: application/json" \
  -d '[{"foo":"foo1","bar":"bar1"}]' \
  https://localhost:8080/onramp/v1/roads/my_road/messages

Consume messages

echo '{"type":"REQUEST","count":1}' |\
  websocat -nk wss://localhost:8080/offramp/v2/roads/my_road/streams/my_stream/messages?defaultOffset=EARLIEST

See: websocat

Building

Build and load docker images to the local docker daemon:

mvn clean package -Djib.goal=dockerBuild

Build without docker images:

mvn clean package -Djib.skip

Build and push docker images to a repo:

mvn clean package -Ddocker.repo=my.docker.repo

Contributors

Special thanks to the following for making data-highway possible!

Dave Maughan
Dave Maughan

💻 🎨 👀 📖
James Grant
James Grant

💻 🎨 👀 📖 📢
Elliot West
Elliot West

💻 🎨 👀 📖 📢
Adrian Woodhead
Adrian Woodhead

💻 🎨 👀 📖
Konrad Dowgird
Konrad Dowgird

💻 🎨 👀 📖
Riccardo Freixo
Riccardo Freixo

💻 🎨 👀 📖 🚇
Monica Nicoara
Monica Nicoara

🤔 📋
Teiva Harsanyi
Teiva Harsanyi

💻
Kryiakos Sideris
Kryiakos Sideris

💻
Sandeep Solanki
Sandeep Solanki

💻

This project follows the all-contributors specification.

Legal

This project is available under the Apache 2.0 License.

Copyright 2019 Expedia Inc.