Skip to content

Commit

Permalink
Merge pull request #19 from oramasearch/feat/adds-fst
Browse files Browse the repository at this point in the history
feat: adds FST-based full-text search
  • Loading branch information
micheleriva authored Nov 18, 2024
2 parents 47b9d72 + 8088a6b commit 875bb26
Show file tree
Hide file tree
Showing 12 changed files with 1,439 additions and 145 deletions.
184 changes: 166 additions & 18 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions nlp/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ impl Clone for TextParser {
impl TextParser {
pub fn from_language(locale: Locale) -> Self {
let (tokenizer, stemmer) = match locale {
Locale::IT => (Tokenizer::italian(), Stemmer::create(Algorithm::Italian)),
Locale::EN => (Tokenizer::english(), Stemmer::create(Algorithm::English)),
// @todo: manage other locales
_ => (Tokenizer::english(), Stemmer::create(Algorithm::English)),
Expand Down
8 changes: 8 additions & 0 deletions nlp/src/tokenizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ impl Tokenizer {
stop_words,
}
}

pub fn italian() -> Self {
let stop_words: HashSet<&str> = Locale::IT.stop_words().unwrap();
Tokenizer {
split_regex: Locale::IT.split_regex().unwrap(),
stop_words
}
}

pub fn tokenize<'a, 'b>(&'a self, input: &'b str) -> impl Iterator<Item = String> + 'b
where
Expand Down
1 change: 1 addition & 0 deletions rustorama/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
reviews.json
3 changes: 3 additions & 0 deletions rustorama/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Rustorama

Download the dataset from https://www.kaggle.com/datasets/abdallahwagih/amazon-reviews and save it as `reviews.json`. Then run `deno run -A insert.js` to upload all the reviews to a local rustorama instance.
8 changes: 8 additions & 0 deletions rustorama/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"data_dir": "/tmp/rustorama",
"http": {
"host": "127.0.0.1",
"port": 8080,
"allow_cors": true
}
}
Loading

0 comments on commit 875bb26

Please sign in to comment.