-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Type and Array for Decimal32 and Decimal64 #7098
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes make sense to me (though is a bit harder to identify where things might not have been added like tests, etc.)
However, I think the addition of new values to DataType
enum would make this a breaking change so would need to wait for next major release?
|
||
/// The default scale for [DataType::Decimal64] values | ||
pub const DECIMAL64_DEFAULT_SCALE: i8 = 6; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a reference for these constants (both max precision/scale & defaults)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'd picked the default scales mostly arbitrarily. The DecimalType
contract requires one, and there isn't really any obviously-correct choice. PostgreSQL has a money
type which is roughly DECIMAL(~18, 2). MSSQL has a smallmoney
that's ~9, 4 and a money
that's ~18, 4. Four seems a little big for a 32-bit decimal and a little small for a 64-bit decimal. I'd love to get feedback about these, but also don't know how important they are.
As for precision, it's based on the maximum number of decimal digits that can be stored without running into the space limits, just like the existing decimals. The largest i32
is 2,147,483,647 and that can reliably store 999,999,999 but not 9,999,999,999 -- so the max precision is 9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the default scales should get wider discussion? Where would be the right place for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense for precision. For default scale, do we have a reference from the C++ implementation possibly, to align with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found defaults in either the C++ or the Go implementation, and I know there's none in the C# code.
ParquetStatistics::Int64(s) => s | ||
.$func() | ||
.map(|x| $stat_value_type::try_from(*x).ok()) | ||
.flatten(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes significant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these changes are required for the code to compile, so I'm somewhat leaning on the compiler here. My naive take was that there shouldn't be a performance loss from the change (and that this wouldn't be a performance-critical path anyway) but I'm willing to believe that I might be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine, was just curious as these changes specifically didn't seem related to the rest here, so wondering what else was changed that made the compiler require this change.
@@ -730,7 +783,7 @@ macro_rules! get_decimal_page_stats_iterator { | |||
native_index | |||
.indexes | |||
.iter() | |||
.map(|x| x.$func.and_then(|x| Some($stat_value_type::from(x)))) | |||
.map(|x| x.$func.and_then(|x| $stat_value_type::try_from(x).ok())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here
I'd gone through and searched the codebase for all instances of
Yes, it's definitely a breaking change. I have to admit that I'm not familiar with the arrow-rs release schedule. I'm used to the main branch of the primary Arrow repo which is always "Vnext". How does that work here? |
Yes I think that is the case Per the schedule this means we could merge this PR once we have branched for 54.2 (early march) https://github.com/apache/arrow-rs?tab=readme-ov-file#arrow-and-parquet-crates |
Got it. In light of this, I'll likely withdraw the PR for now or move to Draft status. It would be good but not critical to figure out what we want as the default scale for the new types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave the PR up; I've tagged with the next-major-release
tag for visibility
ParquetStatistics::Int64(s) => s | ||
.$func() | ||
.map(|x| $stat_value_type::try_from(*x).ok()) | ||
.flatten(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine, was just curious as these changes specifically didn't seem related to the rest here, so wondering what else was changed that made the compiler require this change.
|
||
/// The default scale for [DataType::Decimal64] values | ||
pub const DECIMAL64_DEFAULT_SCALE: i8 = 6; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense for precision. For default scale, do we have a reference from the C++ implementation possibly, to align with that?
Which issue does this PR close?
Part of addressing #6661 but does not close it.
What changes are included in this PR?
This change implements
Decimal32Array
,Decimal32Type
,Decimal64Array
andDecimal64Type
in a relatively minimal fashion. It doesn't implement the full range of casts or support for Parquet or CSV reader. These will come in the next change.Are there any user-facing changes?
Adds several public types. Adds new entries for the
DataType
enum, which is a breaking change.