This directory contains scripts to help with load testing the vectorizer. The
scripts create a table named wiki
with approximately 1.5M rows to be
vectorized.
- Add a
.env
file and put aDB_URL
in it. The value should be a Postgres DB connection URL. It can be a local DB or a remote DB. - Run
./load.sh
. This script will- Download a dataset from HuggingFace
- Load it into a working table named
wiki_orig
- Process the data into the
wiki
table. The original data is already chunked. We have to dechunk it. - [optionally] drop the working tables
- [optionally] dump the
wiki
table towiki.dump
If you already have a wiki.dump
file, you can use ./restore.sh
to recreate
the wiki
table without having to go through the process above. This is much
faster.
Once you have created the wiki
table, you are ready to create one or more
vectorizers on the table. Happy testing!