Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyser parameters in migration #53

Open
drsdre opened this issue Nov 22, 2024 · 4 comments
Open

Analyser parameters in migration #53

drsdre opened this issue Nov 22, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@drsdre
Copy link

drsdre commented Nov 22, 2024

First of thanks for all the hard work on this package!

I'm trying to model an email address analyser profile in the migration analogue to this one:
https://github.com/andrewdieken/elasticsearch-effective-email-search/blob/main/index_settings.json

Is it possible to add filter parameters like 'preserve_original' and 'min_gram' and 'max_gram' within provided fluent function of AnalyzerBlueprint?

Thanks,
Andre

@pdphilip pdphilip added enhancement New feature or request documentation Improvements or additions to documentation and removed enhancement New feature or request labels Nov 24, 2024
@pdphilip
Copy link
Owner

Hey @drsdre - The docs are not clear on this, but you can set the filter settings as ->key_name($value), ie:

Schema::modify('my_index', function (IndexBlueprint $index) {
    $index->settings('max_ngram_diff', 20);
});

Schema::setAnalyser('my_index', function (AnalyzerBlueprint $settings) {
    $settings->filter('email_token_filter')
        ->type('pattern_capture')
        ->preserve_original(true)
        ->patterns(['([^@]+)']);
    $settings->filter('edge_ngram_token_filter')
        ->type('edge_ngram')
        ->min_gram('1')
        ->max_gram('20');
    $settings->analyzer('email_analyzer')
        ->filter([
            'email_token_filter',
            'lowercase',
            'edge_ngram_token_filter',
            'unique',
        ])
        ->type('custom')
        ->tokenizer('uax_url_email');
});

Try and let me know

@drsdre
Copy link
Author

drsdre commented Nov 24, 2024

Thanks for the quick reply.

Adding the settings works now. Applying the analyzer to the fields however seems to be going into a loop. I cannot set the analyzer before the IndexBlueprint is created, but setting the analyser on the field is failing too. Maybe I'm using the wrong IndexBlueprint function. This is what I now have:

        Schema::createIfNotExists(MailDomainLog::TABLE_NAME, function (IndexBlueprint $index) {
            $index->integer('mail_domain_id');
            $index->keyword('domain');
            $index->field('text', 'recipient', [
                'analyzer' => 'email_analyzer',
                'search_analyzer' => 'email_analyzer',
                'search_quote_analyzer' => 'email_analyzer',
            ]);
            $index->keyword('origin');
            //$index->map('origin', 'email_analyzer'); // doesn't work either
            //$index->mapProperty('origin', 'email_analyzer'); // doesn't work either
            $index->text('from');
            $index->text('subject');
        });

        Schema::setAnalyser(MailDomainLog::TABLE_NAME, function (AnalyzerBlueprint $settings) {
            $settings->filter('email_token_filter')
                ->type('pattern_capture')
                ->preserve_original('true')
                ->patterns(['([^@]+)']);
            $settings->filter('edge_ngram_token_filter')
                ->type('edge_ngram')
                ->min_gram('1')
                ->max_gram('30');
            $settings->analyzer('email_analyzer')
                ->filter([
                    'email_token_filter',
                    'lowercase',
                    'edge_ngram_token_filter',
                    'unique'
                ])
                ->type('custom')
                ->tokenizer('uax_url_email');
        });

It fails with the error:
400 Bad Request: Failed to parse mapping: analyzer [email_analyzer] has not been configured in mappings - Failed to parse mapping: analyzer [email_analyzer] has not been configured in mappings

I'm guessing I'm using the wrong mapping?

Your help is much appreciated.

@pdphilip
Copy link
Owner

pdphilip commented Nov 24, 2024

Try:

  1. Creating the index (without mapping email)
  2. Set analyser
  3. Modify index with email mapping: https://elasticsearch.pdphilip.com/schema/migrations/#schemamodify

@drsdre
Copy link
Author

drsdre commented Dec 2, 2024

Thanks. That did work well.

        Schema::modify(MailDomainLog::TABLE_NAME, function (IndexBlueprint $index) {
            $index->field('text', 'recipient', [
                'analyzer' => 'email_analyzer',
                'search_analyzer' => 'email_analyzer',
                'search_quote_analyzer' => 'email_analyzer',
            ]);
            $index->field('text', 'origin', [
                'analyzer' => 'email_analyzer',
                'search_analyzer' => 'email_analyzer',
                'search_quote_analyzer' => 'email_analyzer',
            ]);
        });

Unfortunately, with the email analyser in place I'm no longer getting search results back for these fields. I was under the impression that the analyser would be applied both during indexing and processing the search query. Or do I need to amend the queries to make use of the new sub indexes? I realise this question goes beyond your package and is actually about the workings of ElasticSearch. However if you can point me in the right direction...

Thanks,
Andre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants