Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent and incorrect struct field coercion #14396

Open
alamb opened this issue Feb 1, 2025 · 2 comments · May be fixed by #14409
Open

Inconsistent and incorrect struct field coercion #14396

alamb opened this issue Feb 1, 2025 · 2 comments · May be fixed by #14409
Assignees
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Feb 1, 2025

Describe the bug

When coercing structs with different types DataFusion is inconsistent in its behavior. Sometimes it errors and in other times it is inconsistent

To Reproduce

Consider two structs with two fields

create table t as values
(
 { 'foo': 'baz' }, 
 { 'xxx': arrow_cast('blarg', 'Utf8View') } -- column4 has List of Struct w/ Utf8View and a different field name
);

They can be coerced to the same field using UNION,

> select column1 from t UNION ALL select column2 from t;
+-------------+
| column1     |
+-------------+
| {c0: baz}   |
| {c0: blarg} |
+-------------+
2 row(s) fetched.
Elapsed 0.018 seconds.

Likewise with CASE they are coerced to the same value

> select CASE WHEN 1=2 THEN column1 ELSE column2 END from t ;
+-----------------------------------------------------------------+
| CASE WHEN Int64(1) = Int64(2) THEN t.column1 ELSE t.column2 END |
+-----------------------------------------------------------------+
| {c0: blarg}                                                     |
+-----------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.010 seconds.

Expected behavior

I believe the field names will be different after

However I expect all of the above queries to fail as the field names are incorrect

I expect the following cases to work:

  1. Field names are the same but in different order
  2. Field names are the same but the value types need to be coerced themselves (see case on Type Coercion fails for List with inner type struct which has large/view types #14154)

Additional context

@jayzhan211 suggests: https://github.com/apache/datafusion/pull/14384/files#r1937492704

Yes, I think type union resolution is the correct on for CASE

@alamb alamb added the bug Something isn't working label Feb 1, 2025
alamb added a commit to alamb/datafusion that referenced this issue Feb 1, 2025
alamb added a commit that referenced this issue Feb 1, 2025
* Fix field name during struct equality coercion

* fix bug

* Add more tests

* Update tests per #14396
@Lordworms
Copy link
Contributor

take

@Lordworms Lordworms linked a pull request Feb 2, 2025 that will close this issue
cj-zhukov pushed a commit to cj-zhukov/datafusion that referenced this issue Feb 3, 2025
* Fix field name during struct equality coercion

* fix bug

* Add more tests

* Update tests per apache#14396
@alamb
Copy link
Contributor Author

alamb commented Feb 3, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants