Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFusion Regression (Starting in v43): Type Coercion for UDF Arguments (X --> String) for Specified UDFs #14230

Closed
Tracked by #14008
shehabgamin opened this issue Jan 22, 2025 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@shehabgamin
Copy link
Contributor

shehabgamin commented Jan 22, 2025

Describe the bug

A bug was introduced in DataFusion v43.0.0 that affects type coercion for UDF arguments. Sail's tests uncovered several of these regressions, which required explicit casting in multiple areas as a workaround during the upgrade to DataFusion 43.0.0.

The regressions identified by Sail's tests include the following functions:

  1. ascii
  2. bit_length
  3. contains
  4. ends_with
  5. starts_with
  6. octet_length

Scope of Work:

  • Address the regressions from the list of functions above and port the relevant tests from Sail to cover these issues.

To Reproduce

No response

Expected behavior

No response

Additional context

@shehabgamin shehabgamin added the bug Something isn't working label Jan 22, 2025
@shehabgamin
Copy link
Contributor Author

Take

@shehabgamin shehabgamin changed the title DataFusion Regression (Starting in v43): Coercing Various Scalar UDF Arguments to String DataFusion Regression (Starting in v43): Type Coercion for UDF Arguments Jan 25, 2025
@alamb
Copy link
Contributor

alamb commented Feb 3, 2025

BTW I think @Omega359 hit the same issue about automatically coercing from int to utf8 when testing 45.0.0: #14008 (comment)

utf8view, i32 comparison no longer worked

.with_column(
            STRING_FIELD,
            when(col(STRING_FIELD).eq(lit(83)), lit(82)).otherwise(col(STRING_FIELD))?,
        )?

switching to the obvious change below worked

.with_column(
            STRING_FIELD,
            when(col(STRING_FIELD).eq(lit("83")), lit("82")).otherwise(col(STRING_FIELD))?,
        )?

@alamb alamb changed the title DataFusion Regression (Starting in v43): Type Coercion for UDF Arguments DataFusion Regression (Starting in v43): Type Coercion for UDF Arguments (Int --> String) Feb 3, 2025
@shehabgamin
Copy link
Contributor Author

@alamb @Omega359 It's not only int -> string, pretty much all coercion between types no longer works.

@alamb
Copy link
Contributor

alamb commented Feb 4, 2025

@alamb @Omega359 It's not only int -> string, pretty much all coercion between types no longer works.

We have many tests for coercion, so I feel like the issue is more targeted than all coercion -- clearly we need to figure out what is going on and get some tests in place to make sure it doesn't happen again

@Omega359
Copy link
Contributor

Omega359 commented Feb 4, 2025

I'm in the process of trying to track down another type coercion issue locally:

Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Timestamp(Microsecond, Some("UTC"))

The error message doesn't include the function name unfortunately (PR incoming soon).

@shehabgamin
Copy link
Contributor Author

@Omega359 @alamb I've spent some time exploring the code and feel like I have a decent understanding of how it works. If it's helpful, I'd be happy to walk through it together. It shouldn't take long, and I think it could help clarify some of the points I've been trying to make. I tried to explain everything in detail in the PR, but I realize I might not have done the best job, so I'm happy to discuss further or clarify anything that's unclear.

@shehabgamin shehabgamin changed the title DataFusion Regression (Starting in v43): Type Coercion for UDF Arguments (Int --> String) DataFusion Regression (Starting in v43): Type Coercion for UDF Arguments (X --> String) Feb 4, 2025
@shehabgamin shehabgamin changed the title DataFusion Regression (Starting in v43): Type Coercion for UDF Arguments (X --> String) DataFusion Regression (Starting in v43): Type Coercion for UDF Arguments (X --> String) for Specified UDFs Feb 4, 2025
@alamb
Copy link
Contributor

alamb commented Feb 5, 2025

I will try and review it shortly

@jayzhan211
Copy link
Contributor

Close by #14440

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants