oramasearch · faustoq · Feb 3, 2025 · Feb 3, 2025
diff --git a/docs/app/icon.png b/docs/app/icon.png
diff --git a/docs/content/docs/apis/introduction.mdx b/docs/content/docs/apis/introduction.mdx
@@ -13,15 +13,7 @@ With OramaCore we aim to provide a set of APIs that are backward compatible - or
 
 At the time of writing, with OramaCore being in beta, we are still working on the APIs and SDKs. We are also working on the documentation, so please bear with us.
 
-## Philosophy
-
-The one imperative we have when designing the OramaCore APIs is to make them as simple as possible. We want to make it easy for developers to get started with OramaCore, and to make it easy for them to build applications that use OramaCore.
-
-Any additional steps, any additional complexity, any additional boilerplate, is a failure on our part. We want to make it as easy as possible for you to use OramaCore.
-
-If you think we should improve on this front, please let us know at [[email protected]](mailto:[email protected]). We are always looking for feedback.
-
-## REST APIs vs SDKs
+## APIs & SDKs
 
 We will provide both REST APIs and SDKs for OramaCore.
 

diff --git a/docs/content/docs/apis/meta.json b/docs/content/docs/apis/meta.json
@@ -1,5 +1,5 @@
 {
-  "title": "APIs",
+  "title": "APIs Reference",
   "pages": [
     "introduction",
     "create-collection",

diff --git a/docs/content/docs/architecture.mdx → docs/content/docs/architecture/overview.mdx b/docs/content/docs/architecture.mdx → docs/content/docs/architecture/overview.mdx
@@ -1,5 +1,5 @@
 ---
-title: OramaCore Architecture
+title: Overview
 description: A deep dive into the OramaCore architecture.
 ---
 
@@ -82,7 +82,7 @@ Future versions of OramaCore will move away from this approach by either integra
 
 ### Embeddings Generation
 
-OramaCore automatically generates embeddings for your data. You can configure which models to use via the [configuration](/docs/getting-started/configuration).
+OramaCore automatically generates embeddings for your data. You can configure which models to use via the [configuration](/docs/guide/configuration).
 
 Current benchmarks indicate this implementation can generate up to 1,200 embeddings per second on an RTX 4080 Super. We acknowledge this seems optimistic and will release reproducible benchmarks soon.
 

diff --git a/docs/content/docs/party-planner.mdx → ...ntent/docs/architecture/party-planner.mdx b/docs/content/docs/party-planner.mdx → ...ntent/docs/architecture/party-planner.mdx
diff --git a/docs/content/docs/architecture/write-read.mdx b/docs/content/docs/architecture/write-read.mdx
@@ -0,0 +1,38 @@
+---
+title: Write & Read Side
+description: OramaCore is a modular system, allowing it to run as a monolith or as a distributed system. We split the system into two distinct sides.
+---
+
+OramaCore is a modular system. We allow it to run as a monolith - where all the components are running in a single process - or as a distributed system, where you can scale each component independently.
+
+To allow this, we split the system into two distinct sides: the **Write Side** and the **Read Side**.
+
+If you're running OramaCore in a single node, you won't notice the difference. But if you're running it in a distributed system, you can scale the write side independently from the read side.
+
+## Write Side
+
+The write side is responsible for ingesting data, generating embeddings, and storing them in the vector database. It's also responsible for generating the full-text search index.
+
+It's the part of the system that requires the most GPU power and memory, as it need to generate a lot of content, embeddings, and indexes.
+
+In detail, the write side is responsible for:
+
+- **Ingesting data**. It creates a buffer of documents and flushes them to the vector database and the full-text search index, rebuilding the immutable data structures used for search.
+- **Generating embeddings**. It generates text embeddings for large datasets without interfering with the search performance.
+- **Expanding content (coming soon)**. It is capable of reading images, code blocks, and other types of content, and generating descriptions and metadata for them.
+
+Every insertion, deletion, or update of a document will be handled by the write side.
+
+## Read Side
+
+The read side is responsible for handling queries, searching for documents, and returning the results to the user.
+
+It's also the home of the Answer Engine, which is responsible for generating answers to questions and performing chain of actions based on the user's input.
+
+In detail, the read side is responsible for:
+
+- **Handling queries**. It receives the user's query, translates it into a query that the vector database can understand, and returns the results.
+- **Searching for documents**. It searches for documents in the full-text search index and the vector database.
+- **Answer Engine**. It generates answers to questions, performs chain of actions, and runs custom agents.
+
+Every query, question, or action will be handled by the read side.
diff --git a/...nt/docs/javascript-hooks/introduction.mdx → ...zations/javascript-hooks/introduction.mdx b/...nt/docs/javascript-hooks/introduction.mdx → ...zations/javascript-hooks/introduction.mdx
diff --git a/docs/content/docs/javascript-hooks/meta.json → ...customizations/javascript-hooks/meta.json b/docs/content/docs/javascript-hooks/meta.json → ...customizations/javascript-hooks/meta.json
diff --git a/...cript-hooks/selectEmbeddingProperties.mdx → ...cript-hooks/selectEmbeddingProperties.mdx b/...cript-hooks/selectEmbeddingProperties.mdx → ...cript-hooks/selectEmbeddingProperties.mdx
diff --git a/docs/content/docs/text-embeddings.mdx → ...t/docs/customizations/text-embeddings.mdx b/docs/content/docs/text-embeddings.mdx → ...t/docs/customizations/text-embeddings.mdx
@@ -188,4 +188,4 @@ In this example, we create a collection named `products` that uses the `BGESmall
 
 Since OramaCore ships with a JavaScript runtime integrated, you can use JavaScript hooks to customize text extraction and transformation.
 
-Since this is a more advanced topic, we decided to dedicate it an entire section. Please refer to the [JavaScript Hooks](/docs/getting-started/javascript-hooks#selectembeddingproperties) documentation for more information.
+Since this is a more advanced topic, we decided to dedicate it an entire section. Please refer to the [JavaScript Hooks](/docs/customizations/javascript-hooks/selectEmbeddingProperties) documentation for more information.
diff --git a/docs/content/docs/api-key.mdx → docs/content/docs/guide/api-key.mdx b/docs/content/docs/api-key.mdx → docs/content/docs/guide/api-key.mdx
@@ -4,7 +4,7 @@ description: "API keys are used to authenticate requests to the OramaCore API"
 ---
 import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
 
-[As explained in the introduction](/docs#write-side-read-side), OramaCore is split in two sides: the **reader side** and the **writer side**.
+As explained in the [Architecture](/docs/architecture/write-read) section, OramaCore is split in two sides: the **reader side** and the **writer side**.
 
 Therefore, depending on the operation you want to perform, you will need to use different API keys.
 

diff --git a/docs/content/docs/configuration.mdx → docs/content/docs/guide/configuration.mdx b/docs/content/docs/configuration.mdx → docs/content/docs/guide/configuration.mdx
@@ -1,12 +1,13 @@
 ---
 title: Configuration
 description: Learn how to configure OramaCore
-icon: Cog
 ---
 
+<Callout type='warn'>
 OramaCore is currently under active development. Our goal is to release the first Beta version (**v0.1.0**) on **Jan 31st, 2025**, and the first stable version (**v1.0.0**) on **Feb 28th, 2025**.
 
 While the system is already quite stable, please note that APIs will undergo changes in **v0.1.0** and **v1.0.0**.
+</Callout>
 
 ## Configuring OramaCore
 
@@ -84,7 +85,7 @@ The `writer_side` section configures the writer side of OramaCore. Here are the
   - `data_dir`: The directory where the writer side will persist the data on disk. By default, it's set to `./.data/writer`.
   - `embedding_queue_limit`: The maximum number of embeddings that can be stored in the queue before the writer starts to be blocked. By default, it's set to `50000`.
   - `insert_batch_commit_size`: The number of document insertions after which the write side will commit the changes. By default, it's set to `5000`.
-  - `default_embedding_model`: The default embedding model used to calculate the embeddings if not specified in the collection creation. By default, it's set to `MultilingualE5Small`. See more about the available models in the [Embedding Models](/docs/getting-started/text-embeddings) section.
+  - `default_embedding_model`: The default embedding model used to calculate the embeddings if not specified in the collection creation. By default, it's set to `MultilingualE5Small`. See more about the available models in the [Embedding Models](/docs/customizations/text-embeddings) section.
 
 ## `reader_side`
 
@@ -108,7 +109,7 @@ The `ai_server` section configures the Python gRPC server that is responsible fo
 
 The `embeddings` section configures the embeddings calculation. Here are the available options:
 
-- `default_model_group`: The default model group used to calculate the embeddings if not specified in the collection creation. By default, it's set to `multilingual`. See more about the available models in the [Embedding Models](/docs/getting-started/text-embeddings) section.
+- `default_model_group`: The default model group used to calculate the embeddings if not specified in the collection creation. By default, it's set to `multilingual`. See more about the available models in the [Embedding Models](/docs/customizations/text-embeddings) section.
 - `dynamically_load_models`: Whether to dynamically load the models. By default, it's set to `false`.
 - `execution_providers`: The execution providers used to calculate the embeddings. By default, it's set to `CUDAExecutionProvider` and `CPUExecutionProvider`.
 - `total_threads`: The total number of threads used to calculate the embeddings. By default, it's set to `8`.
diff --git a/docs/content/docs/running-oramacore.mdx → docs/content/docs/guide/installation.mdx b/docs/content/docs/running-oramacore.mdx → docs/content/docs/guide/installation.mdx
@@ -1,12 +1,13 @@
 ---
-title: Running OramaCore
-description: "Downloading, building, and running OramaCore on your machine or in production. "
-icon: Play
+title: Install OramaCore
+description: "Downloading, building, and running OramaCore on your machine or in production."
 ---
 
+<Callout type='warn'>
 OramaCore is currently under active development. Our goal is to release the first Beta version (**v0.1.0**) on **Jan 31st, 2025**, and the first stable version (**v1.0.0**) on **Feb 28th, 2025**.
 
 While the system is already quite stable, please note that APIs will undergo changes in **v0.1.0** and **v1.0.0**.
+</Callout>
 
 ## Using Docker
 
@@ -16,7 +17,7 @@ The simplest way to get started is by pulling the official Docker image from Doc
 docker pull oramasearch/oramacore:latest
 ```
 
-Create a [config.yaml configuration file](/docs/getting-started/configuration) and then run the Docker image:
+Create a [config.yaml configuration file](/docs/guide/configuration) and then run the Docker image:
 
 ```sh
 docker run \
@@ -91,7 +92,7 @@ Then, install the dependencies:
 pip install -r requirements.txt # or pip install -r requirements-cpu.txt
 ```
 
-When you run the server, OramaCore will automatically download the required models specified in the [configuration file](/docs/getting-started/configuration).
+When you run the server, OramaCore will automatically download the required models specified in the [configuration file](/docs/guide/configuration).
 
 The download time will depend on your internet connection.
 

diff --git a/docs/content/docs/index.mdx b/docs/content/docs/index.mdx
@@ -1,21 +1,14 @@
 ---
-title: Introduction
-description: An introduction to OramaCore - a complex AI architecture made easy and open-source.
-icon: Album
+title: Getting Started
+description: Getting started with OramaCore - a complex AI architecture made easy and open-source.
 ---
 import { File, Folder, Files } from 'fumadocs-ui/components/files';
 import { SearchIcon, DatabaseIcon, WholeWordIcon, FileJson } from 'lucide-react';
 
-Building search engines, copilots, answer systems, or pretty much any AI project is harder than it should be.
 
-Even in the simplest cases, you'll need a vector database, a connection to an LLM for generating embeddings, a solid chunking mechanism, and another LLM to generate answers.
-And that's without even considering your specific needs, where all these pieces need to work together in a way that's unique to your use case.
-
-On top of that, you're likely forced to add multiple layers of network-based communication, deal with third-party slowdowns beyond your control, and address all the typical challenges we consider when building high-performance, high-quality applications.
+Building AI projects like search engines or copilots **is harder than it should be**, requiring vector databases, LLMs, chunking, and seamless integration while handling network slowdowns and performance issues. OramaCore simplifies this with a unified, opinionated server for easier development and customization.
 
-OramaCore simplifies the chaos of setting up and maintaining a complex architecture. It gives you a single, easy-to-use, opinionated server that's designed to help you create tailored solutions for your own unique challenges.
-
-## Why OramaCore
+## Quick Start
 
 OramaCore gives you everything you need **in a single Dockerfile**.
 
@@ -45,48 +38,97 @@ You're getting acces to:
     </Card>
 </Cards>
 
-All from a single, self-contained image.
+All from a single, self-contained image. 
+
+To run the image, you can use the following command:
+
+```sh
+docker run \
+  -p 8080:8080 \
+  -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
+  -v ./config.yaml:/app/config.yaml \
+  --gpus all \
+  oramacore
+```
 
-## On being opinionated
+### Configuration
 
-When building OramaCore, we made a deliberate choice to create an opinionated system. We offer strong, general-purpose default configurations while still giving you the flexibility to customize them as needed.
+To get started with OramaCore, you can use the default configuration. But if you want to customize it, you can do so by editing the `config.toml` file.
+You can customize the system to fit your specific needs. Check out the [configuration](/docs/guide/configuration) guide to learn more.
 
-There are plenty of great vector databases and full-text search engines out there. But most of them don't work seamlessly together out of the box—they often require extensive fine-tuning to arrive at a functional solution.
+### Create a collection
 
-Our goal is to provide you with a platform that's ready to go the moment you pull a single Docker file.
+To import data into OramaCore, you need to create a collection. A collection is a group of documents that you can search and interact with. You can create a collection by sending a POST request to the `/collections` endpoint with the collection name and the API Keys to secure it.
+The request should include an Authorization header with the master API key. Learn more about [API Keys](/docs/guide/api-keys).
+
+```sh
+curl -X POST \
+  http://localhost:8080/v0/collections \
+  -H 'Authorization: Bearer <master-api-key>' \
+  -d '{
+    "id": "products",
+    "write_api_key": "my-write-api-key",
+    "read_api_key": "my-read-api-key"
+  }'
+```
+
+### Add documents
 
-## Write Side, Read Side
+Once you have created a collection, you can add documents to it. A document is a JSON object that contains the data you want to search. You can add a document by sending a POST request to the `/collections/:collection_id/documents` endpoint with the document data.
 
-OramaCore is a modular system. We allow it to run as a monolith - where all the components are running in a single process - or as a distributed system, where you can scale each component independently.
+```sh
+curl -X PATCH \
+  http://localhost:8080/v0/collections/{COLLECTION_ID}/documents \
+  -H 'Content-Type: application/json' \
+  -H 'Authorization: Bearer <write_api_key>' \
+  -d '{
+    "id": "1",
+    "title": "My first document",
+    "content": "The quick brown fox jumps over the lazy dog."
+  }'
+```
+You can explore more about [documents](/docs/apis/insert-documents) and how to insert documents into a collection.
+
+### Search
+
+Now that you have added documents to your collection, you can perform your first search query using the `/search` endpoint. You can send a POST request to the `/search` endpoint with the search query and the collection ID to get the results.
+
+```sh
+curl -X POST \
+  http://localhost:8080/v0/collections/{COLLECTION_ID}/search?api-key=<read_api_key> \
+  -H 'Content-Type: application/json' \
+  -d '{ "term": "The quick brown fox" }'
+```
+
+You can now perform unlimited, fast searches on your data using OramaCore! Check out the supported [Search Parameters](/docs/apis/search-documents#search-parameters) to learn more about how to customize your search queries results.
 
-To allow this, we split the system into two distinct sides: the **write side** and the **read side**.
+Out of the box, OramaCore is ready to go with a powerful search engine, featuring Full Text search, Vector Search and Hybrid Search. You can start building your AI projects right away! 🚀
 
-If you're running OramaCore in a single node, you won't notice the difference. But if you're running it in a distributed system, you can scale the write side independently from the read side.
+---
+
+## Why OramaCore?
 
-### Write Side
+Building search engines, copilots, answer systems, or pretty much any AI project is pretty challenging.
+Even in the simplest cases, you'll need a vector database, a connection to an LLM for generating embeddings, a solid chunking mechanism, and another LLM to generate answers.
+And that's without even considering your specific needs, where all these pieces need to work together in a way that's unique to your use case.
 
-The write side is responsible for ingesting data, generating embeddings, and storing them in the vector database. It's also responsible for generating the full-text search index.
+On top of that, you're likely forced to add multiple layers of network-based communication, deal with third-party slowdowns beyond your control, and address all the typical challenges we consider when building high-performance, high-quality applications.
 
-It's the part of the system that requires the most GPU power and memory, as it need to generate a lot of content, embeddings, and indexes.
+OramaCore simplifies the chaos of setting up and maintaining a complex architecture. It gives you a single, easy-to-use, opinionated server that's designed to help you create tailored solutions for your own unique challenges.
 
-In detail, the write side is responsible for:
+### Philosophy
 
-- **Ingesting data**. It creates a buffer of documents and flushes them to the vector database and the full-text search index, rebuilding the immutable data structures used for search.
-- **Generating embeddings**. It generates text embeddings for large datasets without interfering with the search performance.
-- **Expanding content (coming soon)**. It is capable of reading images, code blocks, and other types of content, and generating descriptions and metadata for them.
+When building OramaCore, we made a deliberate choice to create **an opinionated system**. We offer strong, general-purpose default configurations while still giving you the flexibility to customize them as needed.
 
-Every insertion, deletion, or update of a document will be handled by the write side.
+There are plenty of great vector databases and full-text search engines out there. But most of them don't work seamlessly together out of the box—they often require extensive fine-tuning to arrive at a functional solution.
 
-### Read Side
+Our goal is to provide you with a platform that's ready to go the moment you pull a single Docker file.
 
-The read side is responsible for handling queries, searching for documents, and returning the results to the user.
+### OramaCore APIs
 
-It's also the home of the Answer Engine, which is responsible for generating answers to questions and performing chain of actions based on the user's input.
+The one imperative we have when designing the OramaCore APIs is to make them as simple as possible. We want to make it easy for developers to get started with OramaCore, and to make it easy for them to build applications that use OramaCore.
 
-In detail, the read side is responsible for:
+Any additional steps, any additional complexity, any additional boilerplate, is a failure on our part. We want to make it as easy as possible for you to use OramaCore.
 
-- **Handling queries**. It receives the user's query, translates it into a query that the vector database can understand, and returns the results.
-- **Searching for documents**. It searches for documents in the full-text search index and the vector database.
-- **Answer Engine**. It generates answers to questions, performs chain of actions, and runs custom agents.
+If you think we should improve on this front, please let us know at [[email protected]](mailto:[email protected]). We are always looking for feedback.
 
-Every query, question, or action will be handled by the read side.
diff --git a/docs/content/docs/meta.json b/docs/content/docs/meta.json
@@ -3,17 +3,15 @@
   "description": "OramaCore Documentation",
   "root": true,
   "pages": [
-    "---Getting Started---",
+    "---Guide---",
     "index",
-    "api-key",
-    "configuration",
-    "running-oramacore",
+    "...guide",
     "apis",
     "---Customizations---",
-    "text-embeddings",
-    "javascript-hooks",
+    "...customizations",
     "---Architecture---",
-    "architecture",
-    "party-planner"
+    "architecture/overview",
+    "architecture/write-read",
+    "architecture/party-planner"
   ]
 }
Original file line number	Diff line number	Diff line change
Expand Up		@@ -188,4 +188,4 @@ In this example, we create a collection named `products` that uses the `BGESmall

		Since OramaCore ships with a JavaScript runtime integrated, you can use JavaScript hooks to customize text extraction and transformation.

		Since this is a more advanced topic, we decided to dedicate it an entire section. Please refer to the [JavaScript Hooks](/docs/getting-started/javascript-hooks#selectembeddingproperties) documentation for more information.
		Since this is a more advanced topic, we decided to dedicate it an entire section. Please refer to the [JavaScript Hooks](/docs/customizations/javascript-hooks/selectEmbeddingProperties) documentation for more information.