Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: wip upgrade pgroonga to latest #1418

Merged
merged 12 commits into from
Feb 4, 2025
Merged

chore: wip upgrade pgroonga to latest #1418

merged 12 commits into from
Feb 4, 2025

Conversation

samrose
Copy link
Contributor

@samrose samrose commented Jan 24, 2025

What kind of change does this PR introduce?

This PR upgrades pgroonga to v3.2.5 underlying groonga did not require updating.

@samrose samrose marked this pull request as ready for review January 29, 2025 15:44
@samrose samrose requested a review from a team as a code owner January 29, 2025 15:44
@samrose
Copy link
Contributor Author

samrose commented Jan 29, 2025

In addition to the tests already run in our test suite, I have run

-- Test table setup
CREATE TABLE tokenizer_test (
    id SERIAL PRIMARY KEY,
    content text
);

-- Insert test data that will be used across all tokenizer tests
INSERT INTO tokenizer_test (content) VALUES 
('hello world'),
('こんにちは世界'),  -- Japanese text
('123ABC'),
('hello-world'),
('[email protected]'),
('product#123');

-- Basic Tokenizers
CREATE INDEX idx_pgroonga_unigram ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenUnigram');

CREATE INDEX idx_pgroonga_bigram ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenBigram');

CREATE INDEX idx_pgroonga_trigram ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenTrigram');

CREATE INDEX idx_pgroonga_delimit ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenDelimit');

-- TokenDelimitNull (similar to TokenDelimit but treats NULL as a token)
CREATE INDEX idx_pgroonga_delimitnull ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenDelimitNull');

CREATE INDEX idx_pgroonga_ngram ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenNgram');

-- Bigram variants
CREATE INDEX idx_pgroonga_bigramsplitsymbol ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenBigramSplitSymbol');

CREATE INDEX idx_pgroonga_bigramsplitsymbolalpha ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenBigramSplitSymbolAlpha');

CREATE INDEX idx_pgroonga_bigramsplitsymbolalphadigit ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenBigramSplitSymbolAlphaDigit');

CREATE INDEX idx_pgroonga_bigramignoreblank ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenBigramIgnoreBlank');

CREATE INDEX idx_pgroonga_bigramignoreblanksplitsymbol ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenBigramIgnoreBlankSplitSymbol');

CREATE INDEX idx_pgroonga_bigramignoreblanksplitsymbolalpha ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenBigramIgnoreBlankSplitSymbolAlpha');

CREATE INDEX idx_pgroonga_bigramignoreblanksplitsymbolalphadigit ON tokenizer_test 
USING pgroonga (content) WITH (tokenizer='TokenBigramIgnoreBlankSplitSymbolAlphaDigit');

-- -- Pattern-based tokenizers
-- CREATE INDEX idx_pgroonga_pattern ON tokenizer_test 
-- USING pgroonga (content) WITH (
--     tokenizer='TokenPattern',
--     "tokenizer_pattern"='([[:alpha:]]+)|([[:digit:]]+)'
-- );

-- CREATE INDEX idx_pgroonga_regexp ON tokenizer_test 
-- USING pgroonga (content) WITH (
--     tokenizer='TokenRegexp',
--     "tokenizer_regexp_pattern"='\w+'
-- );

-- Custom table-based tokenizer
CREATE TABLE IF NOT EXISTS tokenizer_test_table (
    target text,
    normalized text
);

INSERT INTO tokenizer_test_table VALUES 
('world', 'WORLD'),
('hello', 'HELLO'),
('test', 'TEST');


-- Add TokenMecab if available
DO $$
BEGIN
    IF EXISTS (
        SELECT 1
        FROM json_array_elements(
            pgroonga_command('tokenizer_list')::json->1
        ) AS tokenizer
        WHERE tokenizer->>'name' = 'TokenMecab'
    ) THEN
        EXECUTE $index$
            CREATE INDEX idx_pgroonga_mecab ON tokenizer_test 
            USING pgroonga (content) WITH (tokenizer='TokenMecab')
        $index$;
    END IF;
END $$;

test

CREATE TEMPORARY TABLE validation_results (
    test_name TEXT,
    status TEXT,
    details TEXT,
    pgroonga_version TEXT
);

DO $$
DECLARE
    pgroonga_ver TEXT;
    index_record RECORD;  -- Explicit declaration
BEGIN
    -- Get PGroonga extension version
    SELECT extversion INTO pgroonga_ver 
    FROM pg_extension 
    WHERE extname = 'pgroonga';

    IF NOT FOUND THEN
        pgroonga_ver := 'Not installed';
    END IF;

    -- Verify table existence
    IF NOT EXISTS (
        SELECT 1 FROM information_schema.tables 
        WHERE table_name = 'tokenizer_test'
    ) THEN
        INSERT INTO validation_results VALUES ('Table existence', 'FAIL', 'Missing tokenizer_test table', pgroonga_ver);
    ELSE
        INSERT INTO validation_results VALUES ('Table existence', 'PASS', '', pgroonga_ver);
    END IF;

    -- Verify test data
    PERFORM 1 FROM tokenizer_test WHERE content = 'hello world';
    IF NOT FOUND THEN
        INSERT INTO validation_results VALUES ('Test data', 'FAIL', 'Missing base test record', pgroonga_ver);
    ELSE
        INSERT INTO validation_results VALUES ('Test data', 'PASS', '', pgroonga_ver);
    END IF;

    -- Verify indexes
    FOR index_record IN
        SELECT indexname FROM pg_indexes 
        WHERE tablename = 'tokenizer_test'
    LOOP
        BEGIN
            EXECUTE format('SELECT 1 FROM tokenizer_test WHERE content &@~ ''world''');
            INSERT INTO validation_results VALUES (
                'Index check: ' || index_record.indexname,
                'PASS',
                '',
                pgroonga_ver
            );
        EXCEPTION WHEN others THEN
            INSERT INTO validation_results VALUES (
                'Index check: ' || index_record.indexname,
                'FAIL',
                SQLERRM,
                pgroonga_ver
            );
        END;
    END LOOP;

    -- Special pattern checks
    BEGIN
        PERFORM 1 FROM tokenizer_test WHERE content &@ 'hello-world';
        INSERT INTO validation_results VALUES ('Symbol handling', 'PASS', '', pgroonga_ver);
    EXCEPTION WHEN others THEN
        INSERT INTO validation_results VALUES ('Symbol handling', 'FAIL', SQLERRM, pgroonga_ver);
    END;

    -- Numeric handling
    BEGIN
        PERFORM 1 FROM tokenizer_test WHERE content &@ '123';
        INSERT INTO validation_results VALUES ('Numeric handling', 'PASS', '', pgroonga_ver);
    EXCEPTION WHEN others THEN
        INSERT INTO validation_results VALUES ('Numeric handling', 'FAIL', SQLERRM, pgroonga_ver);
    END;

    -- Email pattern
    BEGIN
        PERFORM 1 FROM tokenizer_test WHERE content &@~ 'test@email\.com';
        INSERT INTO validation_results VALUES ('Email pattern', 'PASS', '', pgroonga_ver);
    EXCEPTION WHEN others THEN
        INSERT INTO validation_results VALUES ('Email pattern', 'FAIL', SQLERRM, pgroonga_ver);
    END;
END $$;

-- Final results with version
SELECT * FROM validation_results;

@samrose
Copy link
Contributor Author

samrose commented Jan 29, 2025

next step is to test upgrade and pause/restore issues (if any) for this change

@samrose samrose merged commit 3cbe23c into develop Feb 4, 2025
11 of 12 checks passed
@samrose samrose deleted the sam/pgroonga-update branch February 4, 2025 12:49
darora pushed a commit that referenced this pull request Feb 5, 2025
* chore: wip upgrade pgroonga to latest

* chore: strings for staging tests of image

* chore: staging version

* chore: bump staging vrsion

* chore: GRN_PLUGINS_DIR env var

* chore: make sure nix available

* chore: no longer need sudo

* chore: bump version

* chore: bump upload artifact action
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants