Skip to content

Commit

Permalink
why: history section "done"
Browse files Browse the repository at this point in the history
  • Loading branch information
deobald committed Nov 13, 2023
1 parent 3ed3ef6 commit 222b727
Showing 1 changed file with 65 additions and 1 deletion.
66 changes: 65 additions & 1 deletion src/appendix/why.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,74 @@
# Why?

Why build Endatabas at all?
Why did we build Endatabas at all?
Isn't one of the many ([many](https://www.dbdb.io)) existing databases good enough?

## What is Endatabas, anyway?

The tagline "SQL Document Database With Full History" says a lot, but it doesn't say everything.
Endatabas is, first and foremost, an _immutable database_.
That's the Full History part.
But storing all your data, forever, has clear implications.

We consider these implications to be the _pillars_ of Endatabas.
In 3D geometry, the legs of a tripod are mutually supportive; as long as all three feet are in contact with the ground, the tripod will not wobble or collapse.
So it is with the pillars.
Each supports and implies the others.
The pillars are as follows:

* Full History (requires: immutable data and erasure)
* Timeline (requires: time-traveling queries)
* Separation of Storage from Compute (requires: light and adaptive indexing)
* Documents (requires: schemaless tables, "schema-per-row", arbitrary joins)
* Analytics (requires: columnar storage and access)

At the top of this 5D tripod is SQL, the lingua franca of database queries.
A window of time has recently opened when all of this is finally possible.
But first we can go back in history to see how we got here.

## History

None of the ideas in Endatabas are new.

George Copeland's [_What if mass storage were free?_](https://www.endatabas.com/references.html#10.1145/800083.802685)
asked, back in 1980, what an immutable database might look like.
His prescient vision for a database with full history enjoys the clarity of a researcher at the beginning of the database era.
People have occasionally asked of Endatabas, "why bother retaining all history?"
But this is the wrong question.
The real question is: "why bother destroying data?"
Copeland's answers, "The deletion concept was invented to reuse expensive computer storage."
The software industry has grown so accustomed to the arbitrary deletion historical data that we now take destroying data for granted.

Mass storage is not free yet -- but it is cheap.
Copeland himself addresses "a more realistic argument: if the cost of mass storage were low enough, then deletion would become undesirable."
Any system that exploits the separation of storage and compute can enjoy these low costs.

Jensen and Snodgrass have thoroughly researched time-related database queries.
Much of their work was published [in the 1990s](https://www.endatabas.com/bibliography.html#10.1109/69.755613)
and early 2000s.
Storing time, querying across time, time as a value ... these challenging subjects eventually grew to form
[SQL:2011](https://www.endatabas.com/bibliography.html#ISO/IEC-19075-2:2021).
Most SQL databases have struggled to implement SQL:2011 because incorporating _time_ as a core concept in databases which support destructive updates and deletes amplifies existing complexity.

Document databases have a more convoluted story.
Attempts at "schemaless", semi-structured, document, and object databases stretch from
[Smalltalk in the 1980s](https://www.endatabas.com/bibliography.html#10.1145/971697.602300)
to [C++ in the 1990s](https://en.wikipedia.org/wiki/Object_database#Timeline)
to [Java](https://prevayler.org/)
and [graphs](https://en.wikipedia.org/wiki/Neo4j) in the 2000s
to [JSON in the 2010s](https://en.wikipedia.org/wiki/MongoDB).
Despite all this, the most successful semi-structured document store, as of 2023, is a JSON column in a Postgres database.
Database users desire flexible storage and querying -- but yesterday's weather says they desire SQL more.

Khoshafian and Copeland introduced the [Decomposition Storage Model (DSM)](https://www.endatabas.com/bibliography.html#10.1145/318898.318923)
in 1985.
The four decades that followed saw any number of approaches to analytical processing in database.
Most of the time, however, these tended to demand labour-intensive data logistics:
data was piped, streamed, dumped, and copied into denormalized cubes and time-series databases.
As humanity grew out of the batch processing of the 1980s into the always-online society of the 2020s, analytics data became another form of operational data and parts of this pipeline were looped back to users and customers.
Recently, Hybrid Transactional/Analytical Processing (HTAP) promises a simpler, natural successor to OLTP and OLAP systems.
For many businesses, the transactional/analytical divide is as arbitrary as deleting data because hard disks were expensive in 1986.

## Timing

outside => in:
Expand Down

0 comments on commit 222b727

Please sign in to comment.