-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datetime): enhance datetime parsing and validation #2129
base: master
Are you sure you want to change the base?
feat(datetime): enhance datetime parsing and validation #2129
Conversation
Add validation and parsing for DateColumnDef and DateTimeColumnDef to handle invalid date formats and zero dates as null. Introduce a DateValidator with regex matching for date formats. Extend tests to cover invalid dates and malformed datetime strings.
what was the behavior of the bootstrapper wrt to "0000-00-00" style dates (if the as-null flag was off) before this change? I know, it absolutely sucks to output invalid datetimes into the stream but all things equal the |
That's a good question. I will try and run through those scenarios when I get a chance today to get an idea of what it did and how the change will affect it. Might be necessary to call out the all zero path and allow it to run the normal path through without validating it from there. |
I isolated the date validation logic outside of the all zero case and I believe kept the logic of the all zero case the same as what existed before. From my testing locally it seems to work in the same way as before. |
else | ||
return appendFractionalSeconds("0000-00-00 00:00:00", 0, getColumnLength()); | ||
} else { | ||
if ( !DateValidator.isValidDateTime(dateString) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delay, I'm coming back to this PR since I'm prepping a release of your mariadb stuff and some other stuff too. I'm a little worried about running a regex in a very hot code-path...
But I guess this only runs on bootstrapping so it might be ok? can you confirm that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just making it back to looking at this to see where it was at. From my testing, the value instanceof String is only triggered through the bootstrapping and not during normal processing. It goes down the Long path at least from my debugging I was doing. If it is just bootstrapping are you good with it or should I look for a more efficient way to verify that Date/DateTime instead of using Regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm yeah I don't know. I had a performance benchmark setup at some point, let me see if I can maybe measure the impact of the regex on bootstrapping.
if (value == null) { | ||
return null; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mildly, same concerns here -- is this just for bootstrapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am only seeing bootstrapping taking the String path. Otherwise, during normal processing it takes the Long path.
would be nice to slap some integration tests around this code (esp checking behavior in bootstrapping, which I believe is where this change targets). |
sorry to say it's a big perf hit. Bootstrapping a table with 2 timestamps, 2 datetimes: before your patch:
after your patch:
I tried to profile it in visualVM for more proof but it turns out that profiler is lousy. I pushed the bootstrap benchmarker into your branch, LMK if you need pointers in running it |
Add validation and parsing for DateColumnDef and DateTimeColumnDef to handle invalid date formats and zero dates as null. Introduce a DateValidator with regex matching for date formats. Extend tests to cover invalid dates and malformed datetime strings.
MySQL and MariaDB allow invalid dates by default unless strict SQL mode is on, enabling entries like 2020-00-01 to be stored without error, which can lead to data processing issues. In older systems, data may be in poor condition, making it difficult to switch to strict mode, so these cases often need handling when using CDC.