We did a hackathon #2871

MatthewAry · 2023-10-20T00:00:02Z

MatthewAry
Oct 20, 2023

This October, I decided to organize a Hackathon within my company. We spent 36+ man-hours working on building a SurrealDB powered application. It was simply a data retrieval application that was populated with data from an FDA database.

💡 The data that we used can be found here:

Our objective was to take the Raw FDA device data and put it into SurrealDB. We also wanted to create embeddings for the product classifications because the FDA classification system is pretty bad. We thought that we could use OpenAI’s text-ada-001 embedder to generate our embedding values for the classifications.

Our hackathon was conducted by 3 people, and only I had any experience playing around with SurrealDB but had never used it on a data set this big. Anyways, we created a schema, albeit a flawed one and not fully utilized.

DEFINE TABLE industry;
DEFINE FIELD name                 ON TABLE industry TYPE string;
DEFINE FIELD description          ON TABLE industry TYPE option<string>;
DEFINE INDEX industryNameIsUnique ON TABLE industry FIELDS name UNIQUE;

DEFINE TABLE subIndustry;
DEFINE FIELD industry    ON TABLE subIndustry TYPE record<industry>;
DEFINE FIELD name        ON TABLE subIndustry TYPE string;
DEFINE FIELD code        ON TABLE subIndustry TYPE string;
DEFINE FIELD description ON TABLE subIndustry TYPE option<string>;
DEFINE INDEX uniqueKey   ON TABLE subIndustry FIELDS industry, name UNIQUE;

DEFINE TABLE category;
DEFINE FIELD name        ON TABLE category TYPE string;
DEFINE FIELD subIndustry ON TABLE category TYPE record<subIndustry>;
DEFINE FIELD description ON TABLE category TYPE option<string>;
DEFINE FIELD data        ON TABLE category TYPE option<object>;
DEFINE FIELD embedding   ON TABLE category TYPE array<number> DEFAULT [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
DEFINE FIELD embedStr    ON TABLE category TYPE string DEFAULT '';
DEFINE INDEX uniqueKey   ON TABLE category FIELDS subIndustry, name UNIQUE;
DEFINE INDEX embedStrKey ON TABLE category FIELDS embedding MTREE DIMENSION 1536 DIST COSINE;

DEFINE TABLE vendor;
DEFINE FIELD name        ON TABLE vendor TYPE string;
DEFINE FIELD description ON TABLE vendor TYPE option<string>;
DEFINE INDEX uniqueKey   ON TABLE vendor FIELDS name UNIQUE;

DEFINE TABLE product;
DEFINE FIELD name        ON TABLE product TYPE string;
DEFINE FIELD vendor      ON TABLE product TYPE record<vendor>;
DEFINE FIELD category    ON TABLE product TYPE record<category>;
DEFINE FIELD description ON TABLE product TYPE option<string>;
DEFINE FIELD data        ON TABLE product TYPE option<object>;
DEFINE INDEX uniqueKey   ON TABLE product FIELDS vendor, name UNIQUE;

From the FDA Data we renamed classification to category, and from the 510K clearances file, we split it up into two tables, vendors and products. This is because in the 510K data set, vendors are called APPLICANTs and a vendor can sell many products. We thought it would be cool to see what products each vendor sells. We used the PRODUCTCODE field in the two datasets to link the products to classifications.

System:

SurrealDB ran using RocksDB on a 2018 MacBook Pro with 32GB of RAM, 2.6GHz 6‑core 8th‑gen Intel i7. Team members connected to it over a TailScale network. We all connected to the same user account, root. We ran SurrealDB directly on the OS.

Stack:

We used Python and the corresponding SurrealDB SDK to process and load the data into our database and built an application that connected to it using Bun and Elysia. We used the latest version of SurrealDB 1.0.0

Things were working fine until we loaded in the FDA data and did preparations for the embeddings. Disclaimer: This was a hackathon and we readily admit that we could have done things better, but we believe our observations are worth sharing.

We noticed the following:

SurrealDB became slow and sometimes seemingly unresponsive for all participants when we tried to execute a query such as SELECT name, description, (SELECT name FROM product WHERE category = $parent.id) as products FROM category LIMIT 5; we believe the performance would have been better if we had not done a JOIN like query and instead prepared an edge table that would bi-directionally RELATE the product and category tables together. (Could we have an Anti-Pattern blog post?) Regardless, when we ran this query, it seemed like nothing was happening and we couldn’t talk to the database. In addition, looking at the system activity, the SurrealDB process seemed compute bound, activity manager said that it was using 200% of the CPU (which I interpreted to mean that it was only using 2 CPU cores out of 6). Memory and Iops on the disk seemed negligible.

We noticed that when we tried to perform a seemingly simple query such as Count(SELECT * FROM category;) or SELECT * FROM category LIMIT 5; the query also ran extremely slowly. We think it might have something to do with the embedding field but aren’t sure. In order for us to create an index for the embedding field we set a default value for the embedding field to be an array with a length of 1536, with all values set to 0. This length represents the number of dimensions we observed the text-ada-001 embedder returning when we did experiments to generate embeddings.

Or when we ran Count(SELECT * FROM products;) it was also slow (there were a lot of product records!).

We thought that maybe RocksDB might be to blame with how the database was getting locked up but when we tried doing stuff using TiKV as the storage layer (with only one node), we got similar results. We would see SurrealDB lock up when executing long running queries.

Perhaps we were doing something wrong, but we didn’t have a lot of time budgeted to try to figure out what the problem was, and we weren’t sure what we could to diagnose it either. Regardless, it was an interesting exercise, but unfortunately had us deciding to table using SurrealDB for the time being until we can budget the time to revisit this again and try to address the problems we ran into.

We intend to discuss our experience with the SurrealDB team during their weekly office hours. We will post updates to this thread as our journey with SurrealDB progresses.

Update: We talked with the developers during their office hours on October 20th about our experience. These are our takeaways from that conversation.

TLDR: SurrealDB is largely feature complete, but those features have a lot of room for optimization.

Our experiences with our query performance are largely due to the fact that there is a lot of optimization work that still needs to be done for a variety of SurrealDB query features. For this reason the Count, LIMIT and our JOIN like query were slow.

For Count SurrealDB would grab the entire record object when it executed it's operation. It is not efficient like what you might get with MySQL when you would do something like SELECT Count(*) FROM products; because SurrealDB would end up processing more information than what is needed to perform the operation requested efficiently.
For LIMIT, those queries could be slow because if the data selected is not ordered then it has to figure out what objects to retrieve. The developers weren't clear as to what was happening under the hood, but they did tell us that LIMIT is currently not as efficient at execution it should be.
For our JOIN like statement, it seems that we were making SurrealDB do products.length * customers.length comparisons to figure out what records should go together. They told us that they have a very primitive query planner and that we could use the EXPLAIN argument against our query to possibly see what SurrealDB would end up doing. They also told us that we could have used the PARALLEL argument to have the database take a divide and conquer approach for query execution, possibly allowing it to consume more compute resources to complete the query faster.

The developers weren't exactly sure what could have caused slow downs for other connected clients, or the crash.

MatthewAry · 2023-10-20T13:39:17Z

MatthewAry
Oct 20, 2023
Author

Oh yeah, I forgot to mention that we also managed to crash SurrealDB. We posted about that here #2677 (comment)

0 replies

crisskimaryo · 2023-11-03T00:22:44Z

crisskimaryo
Nov 3, 2023

Your hackathon work involving SurrealDB has really caught my interest, especially its performance aspects and the use of Bun and Elysia.js. Is there any chance you could share the source code or provide some insights into your experience? I completely understand if you can't, but I'm just eager to learn from your approach.

1 reply

MatthewAry Jan 2, 2024
Author

Can't really share source. However, Elysia.js is really cool and the tech behind it will likely get written up in a white paper, presented at an ACM SIGGRAPH conference (or something like it). It really pushes perf for JS Runtime web servers to the absolute limit. Not only that Elysia.js has amazing DX. For example, when you use it with Eden Treaty and TypeScript, you get full type inference when you construct your API calls and the responses are fully typed. There's more to say about ElysiaJS but I got to get back to work. 🙂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SurrealDB

We did a hackathon #2871

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

SurrealDB

We did a hackathon #2871

MatthewAry Oct 20, 2023

Replies: 2 comments · 1 reply

MatthewAry Oct 20, 2023 Author

crisskimaryo Nov 3, 2023

MatthewAry Jan 2, 2024 Author

MatthewAry
Oct 20, 2023

Replies: 2 comments 1 reply

MatthewAry
Oct 20, 2023
Author

crisskimaryo
Nov 3, 2023

MatthewAry Jan 2, 2024
Author