diff --git a/README.md b/README.md
new file mode 100644
index 00000000..a47339b5
--- /dev/null
+++ b/README.md
@@ -0,0 +1,79 @@
+# Geneve
+
+Geneve is a data generation tool, its name stands for GENerate EVEnts.
+
+To better understand its basics, consider the Elastic Security's
+[detection engine](https://www.elastic.co/guide/en/security/current/detection-engine-overview.html).
+It regularly searches one or more indices for suspicious events, when a
+match is found it creates an alert. To do so it needs detection rules
+which define what a _suspicious event_ looks like.
+
+The original goal of Geneve is then summarized by:
+
+> Given a detection rule, generate source events that would trigger an alert creation.
+
+It does so by analyzing the rule, building an abstract syntax tree of the
+enclosed query and translating it to an intermediate language that is used
+for generating documents (= events) over and over.
+
+What became obvious over time is that the query at the heart of each rule
+is actually a powerful way to drive the documents generation that goes
+well beyond the alerts triggering.
+
+Additionally, one thing is generating garbage data that satisfies a rule
+and another is generating realistic data that can be analyzed with Kibana,
+which is an implicit goal of the tool.
+
+This last is a quite harder nut to crack than the original goal and is
+currently under development.
+
+If you want to try it, read [Getting started](docs/getting_started.md).
+
+# Status
+
+## Data modeling
+
+The rules/queries parsing, AST creation and IR generation are quite
+developed and rigorously tested by the CI/CD pipelines. The generated
+events are good enough to trigger many of the expected alerts on various
+versions of the stack, from 8.2.0 to 8.6.0, but the work is necessarily
+incomplete albeit as correct as possible.
+
+The detection rules set used for the tests is separately loaded into
+Geneve and is currently locked to version 8.2.0 (718 rules in total). Next
+step is to use the rules preloaded in the Kibana under test
+(https://github.com/elastic/geneve/issues/125).
+
+Kind of issues observed in this area:
+
+1. skipped rules due to unimplemented rule type (ie. threshold) or query
+   language (ie. lucene).
+	 <ins>73 rules</ins>.
+2. generation errors due to unimplemented query language features or
+   improvements needed in what is already implemented.
+	 <ins>80 rules</ins>.
+3. incorrect generation, the expected alerts are actually not created.
+   <ins>5 rules</ins>.
+
+The first two points are detailed in the
+[Documents generation from detection rules](/tests/reports/documents_from_rules.md)
+test report, the last is in the
+[Alerts generation from detection rules](tests/reports/alerts_from_rules.md) one.
+
+Number of rules for which correct data is generated and alerts are created: <ins>560</ins>.
+
+## Data realism
+
+Allowing the user to "click through" requires that generated data exploits
+the relations that Kibana is made to observe. Having relations implies
+having also the entities that such relations connect together, entities
+that need to be consistent in the whole generation batch.
+
+The problem is being understood more and more, parts of its solution are
+already implemented others are still sketched.
+
+## User interface
+
+Geneve is composed of a Python module and a REST API server that exposes
+it. The Python API is quite simple and stable, the REST API instead has
+raw edges and needs proper simplification.
diff --git a/docs/data_model.md b/docs/data_model.md
new file mode 100644
index 00000000..baf17602
--- /dev/null
+++ b/docs/data_model.md
@@ -0,0 +1,105 @@
+# Data model
+
+The Geneve data model describes what data Geneve is expected to generate,
+it guides and constrains the data generation process so that the output
+satisfies your criteria.
+
+Think in this way: data generation is a random process, at its root it
+just produces a long random string made of 0s and 1s. What you actually
+want is to shape the result and channel the randomness so that the
+generated data looks sensible in your context and at the same time never
+quite the same.
+
+In essence, you tell Geneve what you are searching for and it will return
+a json document that is a plausible answer to your search, every time the
+answer is different. If this sounds like "queries" to you, you're right:
+Geneve input is queries.
+
+## Queries
+
+You have to provide at least one query to Geneve, if you give it multiple
+Geneve will randomly choose the one it will generate the document for at
+that round.
+
+Suppose you have this query:
+
+```
+process.name: "*.exe"
+```
+
+What it tells to Geneve is actually: you want the documents to have a field
+named `process.name` and its content needs to match the wildcard `*.exe`.
+
+Generated documents could be:
+
+```json
+{"process.name": "excel.exe"}
+```
+
+```json
+{"process.name": "winword.exe"}
+```
+
+but also, more likely, random letters in the name such as
+
+```json
+{"process.name": "LDow.exe"}
+```
+
+or
+
+```json
+{"process.name": "OjiRlQMX.exe"}
+```
+
+If you really want to control the options, then you can enumerate them
+
+```
+process.name: ("excel.exe" or "winword.exe" or "regedit.exe")
+```
+
+the generated documents can only be one of the three possible, you
+restricted the choice Geneve can do.
+
+Let's do another one
+
+```
+process.name: "10.0.0.0/8"
+```
+
+you get
+
+```json
+{"process.name": "10.0.0.0/8"}
+```
+
+as surprising as it can be, it's the only answer Geneve can give back if you
+don't train it to actually consider `process.name` to be of type `ip address`.
+
+Here comes into play the schema and how it defines what fields and their type. We'll assume
+[ECS](https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html)
+is in use but Geneve does not, if you want ECS you need to load it (see
+[Loading the schema](https://github.com/cavokz/geneve/blob/add-some-docs3/docs/getting_started.md#loading-the-schema)).
+If you use fields not in the schema, Geneve will consider them of type `plain text` (`keyword`, actually).
+
+Now try again with a more appropriate field
+
+```
+source.ip: "10.0.0.0/8"
+```
+
+you get, for example
+
+```json
+{"source.ip": "10.23.84.86"}
+```
+
+## Query languages
+
+All the queries in the examples above are expressed in the
+[Kibana Query Language](https://www.elastic.co/guide/en/kibana/current/kuery-query.html) (Kuery)
+but you can also use the
+[Event Query Language](https://www.elastic.co/guide/en/elasticsearch/reference/current/eql.html) (EQL).
+These are the only two languages supported at the moment but it's well possible to add others.
+
+Independently from the query language used, fields remain those defined by the schema.
diff --git a/docs/getting_started.md b/docs/getting_started.md
new file mode 100644
index 00000000..e64f4280
--- /dev/null
+++ b/docs/getting_started.md
@@ -0,0 +1,325 @@
+# Getting started
+
+## Data generation process
+
+The data generation process uses this analogy: generated data flows from source to sink.
+
+To generate data it is then necessary to define:
+
+* `source`: what data is generated, eg. data model
+* `sink`: where data is sent to, eg. ES index
+* `flow`: how data is transmitted, eg. how fast or how much?
+* `schema`: fields definition, eg. ECS 8.2.0
+
+Each of the above is handled by its own REST API endpoint. An arbitrary
+number of sources, sinks, flows and schemas can be defined on the same
+server.
+
+## Install
+
+Currently Geneve is packaged only for [Homebrew](https://brew.sh), you
+need first to install the Geneve tap
+
+```shell
+$ brew tap elastic/geneve
+```
+
+then the tool itself
+
+```shell
+$ brew install geneve
+```
+
+## REST API server
+
+Data is generated by the Geneve server, you start it with
+
+```shell
+$ geneve serve
+2023/01/31 16:40:23 Control: http://localhost:9256
+```
+
+The server keeps the terminal busy with its logs, to stop just press `^C`.
+The first line in the log shows where to reach it, this is the base url of
+the server, all the API endpoints are reachable (but not browseable) under
+`api/`.
+
+For the rest of this document we'll assume that the following shell
+variables are set:
+
+* `$GENEVE` points to the Geneve server, url `http://localhost:9256`
+* `$TARGET_ES` is the url of the target Elasticsearch instance
+* `$TARGET_KIBANA` is the corresponding Kibana's url
+
+Now open a separate terminal to operate on the server with curl.
+
+## Loading the schema
+
+The schema describes the fields that can be present in a generated
+document. At the moment it needs to be explicitly loaded into the server.
+
+Download the latest version (or any other, if you have preferences) from
+https://github.com/elastic/ecs/releases and search for file `ecs_flat.yml`
+in the folder `ecs-X.Y.Z/generated/ecs/`.
+
+Supposing that the path of said file is in shell variable `$SCHEMA_YAML`, you
+load it with
+
+```shell
+$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/schema/ecs" --data-binary "@$SCHEMA_YAML"
+```
+
+The `ecs` in the endpoint `api/schema/ecs` is an arbitrary name, it's how
+the loaded schema is addressed by the server.
+
+## Define the data model
+
+In the data model you describe the data that shall be generated.  It can
+be as simple as a list of fields that need to be present or more complex
+for defining also the relations among them.
+
+How to write a data model is separate subject (see [Data model](data_model.md)),
+here we focus on how to configure one on the server. You use the `api/source` endpoint.
+
+```shell
+$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
+schema: ecs
+queries:
+  - 'network where cidrMatch(destination.ip, "10.0.0.0/8", "192.168.0.0/16")'
+EOF
+```
+
+Note the reference to the previously loaded schema `ecs` and name of this
+newly defined source, `mydata`. Also, `queries` is a list. You can add as
+many queries you need, at each iteration Geneve will select one randomly.
+
+You can generate some data right on terminal for early inspection
+
+```shell
+$ curl -s "$GENEVE/api/source/mydata/_generate?count=1" | jq
+[
+  {
+    "@timestamp": "2023-01-31T18:19:20.197+01:00",
+    "destination": {
+      "ip": "192.168.130.52"
+    },
+    "event": {
+      "category": [
+        "network"
+      ]
+    }
+  }
+]
+```
+
+## Set the destination
+
+Once you're happy with the data model it's time to configure where data
+shall be sent to. Endpoint `api/sink` serves the purpose.
+
+The command is rather unsofisticated:
+
+```shell
+curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/sink/mydest" --data-binary @- <<EOF
+url: $TARGET_ES/myindex/_doc
+EOF
+```
+
+The generated documents are `POST`ed to the configured url one by one. The
+name of this sink is `mydest`, the destination index is `myindex`.
+
+## Configure the flow
+
+Flow configuration is also quite basic, you just need a source and a sink.
+They need to be already defined in the server.
+
+Use `count` to specify how many documents should be generated and sent to
+the stack. This flow is named `myflow`.
+
+```shell
+$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/flow/myflow" --data-binary @- <<EOF
+source:
+  name: mydata
+sink:
+  name: mydest
+count: 1000
+EOF
+```
+
+All is left to do is to initiate the generation with
+
+```shell
+$ curl -s -XPOST "$GENEVE/api/flow/myflow/_start"
+```
+
+You can also check the progress with 
+
+```shell
+$ curl -s "$GENEVE/api/flow/myflow"
+params:
+    source:
+        name: mydata
+    sink:
+        name: mydest
+    count: 1000
+state:
+    alive: true
+    documents: 250
+    documents_per_second: 350
+```
+
+Or stop it with
+
+```shell
+$ curl -s -XPOST "$GENEVE/api/flow/myflow/_stop"
+```
+
+## Extra steps
+
+Geneve assumes the target stack and index to be ready for documents
+acceptance, it seems pointless and expensive to duplicate the stack and
+indices configuration functionality.
+
+Depending on your needs and the configuration of your stack, you may need
+or not to do extra steps before actually pumping any document into the stack.
+
+### Index mappings
+
+If your target index does not exist and is not managed by any index
+template, then you may want to create it and configure its mappings.
+
+Geneve can help you with the mappings, the `api/source/<name>/_mappings`
+endpoint returns the mappings of all the possible fields that can be
+encountered in the documents generated by that source.
+
+Use the Elasticsearch index API to create the index
+
+```shell
+$ curl -s -XPUT -H "Content-Type: application/json" $TARGET_ES/myindex --data @- <<EOF
+{
+  "mappings": $(curl -fs "$GENEVE/api/source/mydata/_mappings")
+}
+EOF
+```
+
+Note the embedded Geneve source API call to get the mappings, its output
+is merged in the index API request.
+
+### Kibana data view
+
+If you want to use Kibana Security to analyze the generated data, you need
+a data view in place. If your target index is not already included in some
+existing data view, then you need to create one by yourself.
+
+Use the following command to create it from command line
+
+```shell
+$ curl -s -XPOST -H "Content-Type: application/json" -H "kbn-xsrf: true" $TARGET_KIBANA/api/data_views/data_view --data @- <<EOF
+{
+  "data_view": {
+     "title": "myindex"
+  }
+}
+EOF
+```
+
+### GeoIP data
+
+While Geneve is well capable of generating fields with IPv4 and IPv6
+addresses, the same does not apply to their geographical location.
+
+As workaround you can leverage the stack geoip processor to enrich the
+data.
+
+First create the ingest pipeline (ex. `geoip-info`)
+
+```shell
+$ curl -s -XPUT -H "Content-Type: application/json" $TARGET_ES/_ingest/pipeline/geoip-info --data @- <<EOF
+{
+  "description": "Add geoip info",
+  "processors": [
+    {
+      "geoip": {
+        "field": "client.ip",
+        "target_field": "client.geo",
+        "ignore_missing": true
+      }
+    },
+    {
+      "geoip": {
+        "field": "source.ip",
+        "target_field": "source.geo",
+        "ignore_missing": true
+      }
+    },
+    {
+      "geoip": {
+        "field": "destination.ip",
+        "target_field": "destination.geo",
+        "ignore_missing": true
+      }
+    },
+    {
+      "geoip": {
+        "field": "server.ip",
+        "target_field": "server.geo",
+        "ignore_missing": true
+      }
+    },
+    {
+      "geoip": {
+        "field": "host.ip",
+        "target_field": "host.geo",
+        "ignore_missing": true
+      }
+    }
+  ]
+}
+```
+
+Next, append `?pipeline=geoip-info` to the url of your sink (see [Set the
+destination](#set-the-destination)). This instructs the stack to pass the
+generated data through the just created `geoip-info` pipeline.
+
+Optionally, ensure that your stack keeps the Geoip database up to date
+
+```shell
+$ curl -s -XPUT -H "Content-Type: application/json" $TARGET_ES/_cluster/settings --data @- <<EOF
+{
+  "transient": {
+    "ingest": {
+      "geoip": {
+        "downloader": {
+          "enabled": "true"
+        }
+      }
+    }
+  }
+}
+EOF
+```
+
+At last, update your data model so to include the fields you want the
+geoip processor to fill in. Geneve will generate them with random content,
+the ingest pipeline will replace that content with better one.
+
+```shell
+$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
+schema: ecs
+queries:
+  - 'network where
+       cidrMatch(destination.ip, "10.0.0.0/8", "192.168.0.0/16") and
+       destination.geo.city_name != null and
+       destination.geo.country_name != null and
+       destination.geo.location != null
+    '
+EOF
+```
+
+In case the generated IP does not have any entry in the geoip database,
+the ingest pipeline will leave the content generated by Geneve as is. This
+will result in completely bogus randomic city, country etc names. If you
+read them, you'll know where the come from. We've issue
+https://github.com/elastic/geneve/issues/115 to deal with this.
+
+For more details read [GeoIP processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/geoip-processor.html).