-
-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: wip upgrade pgroonga to latest #1418
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In addition to the tests already run in our test suite, I have run -- Test table setup
CREATE TABLE tokenizer_test (
id SERIAL PRIMARY KEY,
content text
);
-- Insert test data that will be used across all tokenizer tests
INSERT INTO tokenizer_test (content) VALUES
('hello world'),
('こんにちは世界'), -- Japanese text
('123ABC'),
('hello-world'),
('[email protected]'),
('product#123');
-- Basic Tokenizers
CREATE INDEX idx_pgroonga_unigram ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenUnigram');
CREATE INDEX idx_pgroonga_bigram ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenBigram');
CREATE INDEX idx_pgroonga_trigram ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenTrigram');
CREATE INDEX idx_pgroonga_delimit ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenDelimit');
-- TokenDelimitNull (similar to TokenDelimit but treats NULL as a token)
CREATE INDEX idx_pgroonga_delimitnull ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenDelimitNull');
CREATE INDEX idx_pgroonga_ngram ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenNgram');
-- Bigram variants
CREATE INDEX idx_pgroonga_bigramsplitsymbol ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenBigramSplitSymbol');
CREATE INDEX idx_pgroonga_bigramsplitsymbolalpha ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenBigramSplitSymbolAlpha');
CREATE INDEX idx_pgroonga_bigramsplitsymbolalphadigit ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenBigramSplitSymbolAlphaDigit');
CREATE INDEX idx_pgroonga_bigramignoreblank ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenBigramIgnoreBlank');
CREATE INDEX idx_pgroonga_bigramignoreblanksplitsymbol ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenBigramIgnoreBlankSplitSymbol');
CREATE INDEX idx_pgroonga_bigramignoreblanksplitsymbolalpha ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenBigramIgnoreBlankSplitSymbolAlpha');
CREATE INDEX idx_pgroonga_bigramignoreblanksplitsymbolalphadigit ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenBigramIgnoreBlankSplitSymbolAlphaDigit');
-- -- Pattern-based tokenizers
-- CREATE INDEX idx_pgroonga_pattern ON tokenizer_test
-- USING pgroonga (content) WITH (
-- tokenizer='TokenPattern',
-- "tokenizer_pattern"='([[:alpha:]]+)|([[:digit:]]+)'
-- );
-- CREATE INDEX idx_pgroonga_regexp ON tokenizer_test
-- USING pgroonga (content) WITH (
-- tokenizer='TokenRegexp',
-- "tokenizer_regexp_pattern"='\w+'
-- );
-- Custom table-based tokenizer
CREATE TABLE IF NOT EXISTS tokenizer_test_table (
target text,
normalized text
);
INSERT INTO tokenizer_test_table VALUES
('world', 'WORLD'),
('hello', 'HELLO'),
('test', 'TEST');
-- Add TokenMecab if available
DO $$
BEGIN
IF EXISTS (
SELECT 1
FROM json_array_elements(
pgroonga_command('tokenizer_list')::json->1
) AS tokenizer
WHERE tokenizer->>'name' = 'TokenMecab'
) THEN
EXECUTE $index$
CREATE INDEX idx_pgroonga_mecab ON tokenizer_test
USING pgroonga (content) WITH (tokenizer='TokenMecab')
$index$;
END IF;
END $$; test CREATE TEMPORARY TABLE validation_results (
test_name TEXT,
status TEXT,
details TEXT,
pgroonga_version TEXT
);
DO $$
DECLARE
pgroonga_ver TEXT;
index_record RECORD; -- Explicit declaration
BEGIN
-- Get PGroonga extension version
SELECT extversion INTO pgroonga_ver
FROM pg_extension
WHERE extname = 'pgroonga';
IF NOT FOUND THEN
pgroonga_ver := 'Not installed';
END IF;
-- Verify table existence
IF NOT EXISTS (
SELECT 1 FROM information_schema.tables
WHERE table_name = 'tokenizer_test'
) THEN
INSERT INTO validation_results VALUES ('Table existence', 'FAIL', 'Missing tokenizer_test table', pgroonga_ver);
ELSE
INSERT INTO validation_results VALUES ('Table existence', 'PASS', '', pgroonga_ver);
END IF;
-- Verify test data
PERFORM 1 FROM tokenizer_test WHERE content = 'hello world';
IF NOT FOUND THEN
INSERT INTO validation_results VALUES ('Test data', 'FAIL', 'Missing base test record', pgroonga_ver);
ELSE
INSERT INTO validation_results VALUES ('Test data', 'PASS', '', pgroonga_ver);
END IF;
-- Verify indexes
FOR index_record IN
SELECT indexname FROM pg_indexes
WHERE tablename = 'tokenizer_test'
LOOP
BEGIN
EXECUTE format('SELECT 1 FROM tokenizer_test WHERE content &@~ ''world''');
INSERT INTO validation_results VALUES (
'Index check: ' || index_record.indexname,
'PASS',
'',
pgroonga_ver
);
EXCEPTION WHEN others THEN
INSERT INTO validation_results VALUES (
'Index check: ' || index_record.indexname,
'FAIL',
SQLERRM,
pgroonga_ver
);
END;
END LOOP;
-- Special pattern checks
BEGIN
PERFORM 1 FROM tokenizer_test WHERE content &@ 'hello-world';
INSERT INTO validation_results VALUES ('Symbol handling', 'PASS', '', pgroonga_ver);
EXCEPTION WHEN others THEN
INSERT INTO validation_results VALUES ('Symbol handling', 'FAIL', SQLERRM, pgroonga_ver);
END;
-- Numeric handling
BEGIN
PERFORM 1 FROM tokenizer_test WHERE content &@ '123';
INSERT INTO validation_results VALUES ('Numeric handling', 'PASS', '', pgroonga_ver);
EXCEPTION WHEN others THEN
INSERT INTO validation_results VALUES ('Numeric handling', 'FAIL', SQLERRM, pgroonga_ver);
END;
-- Email pattern
BEGIN
PERFORM 1 FROM tokenizer_test WHERE content &@~ 'test@email\.com';
INSERT INTO validation_results VALUES ('Email pattern', 'PASS', '', pgroonga_ver);
EXCEPTION WHEN others THEN
INSERT INTO validation_results VALUES ('Email pattern', 'FAIL', SQLERRM, pgroonga_ver);
END;
END $$;
-- Final results with version
SELECT * FROM validation_results; |
next step is to test upgrade and pause/restore issues (if any) for this change |
pcnc
approved these changes
Feb 3, 2025
darora
pushed a commit
that referenced
this pull request
Feb 5, 2025
* chore: wip upgrade pgroonga to latest * chore: strings for staging tests of image * chore: staging version * chore: bump staging vrsion * chore: GRN_PLUGINS_DIR env var * chore: make sure nix available * chore: no longer need sudo * chore: bump version * chore: bump upload artifact action
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What kind of change does this PR introduce?
This PR upgrades pgroonga to v3.2.5 underlying groonga did not require updating.