diff --git a/README.md b/README.md index d33cf06..21ea086 100644 --- a/README.md +++ b/README.md @@ -1,47 +1,54 @@ # edm: Edge DNSTAP Minimiser ## About +`edm` reads DNSTAP and depending on configuration can output some different +data based on the observed messages: +* DNS queries for names considered well-known will be summarised into +histograms which are saved as parquet files. These files will then be submitted +to Core. +* DNS queries for names not considered well-known are collected into other +parquet files for further local analysis and here the complete message content +is saved but the client and server IP-addresses are pseudonymised via +[Crypto-PAn](https://en.wikipedia.org/wiki/Crypto-PAn). +* DNS queries that are not considered well-known and have never been seen +before by a given instance of `edm` will result in notifications beingsent to +Core via MQTT messages. -Tool for reading dnstap data, pseudonymising IP addresses and outputting minimised output data. - -Currently expects to read dnstap from a unix socket and writes out parquet -files for the collected information. - -Requires a DAWG file for keeping track of well-known domains. Such a file can -be created using the tool available in - - +## Usage +Running `edm` requires the creation of a TOML config file for holding the +crypto-PAn secret used for pseudonymisation as well as a +`well-known-domains.dawg` file which can be created using + + +### Steps for a basic local-only setup +A basic setup where `edm` will listen on a unix socket for DNSTAP data and +output files to a directory structure under `/tmp/edm` but not send anything to +Core can be created like this: ```text -Usage: - edm [command] - -Available Commands: - completion Generate the autocompletion script for the specified shell - help Help about any command - run Run edm in dnstap capture mode - -Flags: - --config string config file for sensitive information (default is $HOME/.edm.yaml) - -h, --help help for edm - -Use "edm [command] --help" for more information about a command. +echo 'cryptopan-key = "mysecret"' > edm.toml +curl -O https://www.domcop.com/files/top/top10milliondomains.csv.zip +unzip top10milliondomains.csv.zip +tapir-cli dawg --standalone compile --format csv --src top10milliondomains.csv --dawg well-known-domains.dawg +edm run --input-unix /tmp/edm/input.sock --data-dir /tmp/edm/data --config edm.toml --well-known-domains well-known-domains.dawg --disable-mqtt --disable-histogram-sender ``` -## Usage - -Using the tool requires the creation of a TOML config file for holding the -crypto-PAn secret (by default the config is read from the current working - directory) as well as a `well-known-domains.dawg` file which can be created -using - -Basic usage, writing output files to a directory structure under `/var/lib/edm` - +Since all communication with Core is disabled this is helpful for creating some +local parquet files to look around in. For inspecting the content you can use +e.g. [DuckDB](https://duckdb.org) like so: +### For summarised histogram data ```text -echo 'cryptopan-key = "mysecret"' > edm.toml -edm-dawg-maker -edm run --input-unix /opt/unbound/dnstap.sock +duckdb -c 'select * from "/tmp/edm/data/parquet/histograms/outbox/dns_histogram-2024-09-26T18-14-00Z_2024-09-26T18-15-00Z.parquet"' +``` +### For pseudonymised session (full message) data +```text +duckdb -c 'select * from "/tmp/edm/data/parquet/sessions/dns_session_block-2024-09-26T18-18-00Z_2024-09-26T18-19-00Z.parquet"' ``` +Next to the parquet directory you will also see a directory called "pebble". +This is where `edm` keeps its key-value store which is used to tell if a +query name has been seen before or not. The key-value store being used is +[pebble](https://github.com/cockroachdb/pebble) + ## Development ### Formatting and linting @@ -54,4 +61,4 @@ run at the top level directory prior to commiting: * `staticcheck ./...` (see [staticcheck](https://staticcheck.io)) * `gosec ./...` (see [gosec](https://github.com/securego/gosec)) * `golangci-lint run` (see [golangci-lint](https://golangci-lint.run)) -* `go test ./...` +* `go test -race ./...`