Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Seg fault in copy from statement #4615

Open
acquamarin opened this issue Dec 9, 2024 · 1 comment
Open

Bug: Seg fault in copy from statement #4615

acquamarin opened this issue Dec 9, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@acquamarin
Copy link
Collaborator

Kùzu version

master

What operating system are you using?

No response

What happened?

The COPY FROM statement seg faults in kuzu:

kuzu ~/tmp/kz2
Opening the database at path: /Users/bgoosman/tmp/kz2 in read-write mode.
Enter ":help" for usage hints.
kuzu> CREATE NODE TABLE Entity(category string, label string, PRIMARY KEY (label));
┌────────────────────────────────┐
│ result                         │
│ STRING                         │
├────────────────────────────────┤
│ Table Entity has been created. │
└────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.35ms (compiling), 2.24ms (executing)
kuzu> CREATE REL TABLE RELATED_TO(FROM Entity TO Entity, source_id int64);
┌────────────────────────────────────┐
│ result                             │
│ STRING                             │
├────────────────────────────────────┤
│ Table RELATED_TO has been created. │
└────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.05ms (compiling), 4.42ms (executing)
kuzu> COPY Entity FROM "/Users/bgoosman/kineviz/sightxr_api_ex/_build/dev/lib/sightxr_api/priv/static/parquet/2024-12-09T21:20:51.913417Z/Entity.parquet";
┌───────────────────────────────────────────────────┐
│ result                                            │
│ STRING                                            │
├───────────────────────────────────────────────────┤
│ 1059 tuples have been copied to the Entity table. │
└───────────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 5.48ms (compiling), 106.60ms (executing)
kuzu> COPY RELATED_TO FROM "/Users/bgoosman/kineviz/sightxr_api_ex/_build/dev/lib/sightxr_api/priv/static/parquet/2024-12-09T21:20:51.913417Z/RELATED_TO.parquet";
Pipelines Finished: 0/4
Current Pipeline Progress: 100%
[1]    70205 segmentation fault  kuzu ~/tmp/kz2

However, the load from works:

LOAD FROM "/Users/bgoosman/kineviz/sightxr_api_ex/_build/dev/lib/sightxr_api/priv/static/parquet/2024-12-09T21:20:51.913417Z/RELATED_TO.parquet" return *

Are there known steps to reproduce?

No response

@acquamarin acquamarin added the bug Something isn't working label Dec 9, 2024
@ray6080
Copy link
Contributor

ray6080 commented Dec 10, 2024

The seg fault is within index lookup during copy rel. It is due to mismatched data types between the input parquet file and table schema. More specifically, the table schema expects input file columns to be (FROM STRING, TO STRING, source_id INT64], while the input parquet file is (FROM STRING, source_id INT64, TO STRING). This leads to incorrect type parsing in index lookup, which leads to seg fault.

There are several things we should do to properly address issues similar to this one:

  • Binder should check data types and cast if necessary on primary key columns. We should expect binder to throw error or add implicit casting in this case.
  • A better mechanism for users to provide FROM and TO columns during COPY. e.g., COPY rel(_from, source_id, _to) FROM 'test.paruqet'
  • Optionally, index lookup or hash index should protect against this kind of behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants