-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDEV-35693: Improve SSS column sizes #3772
base: 11.7
Are you sure you want to change the base?
Conversation
sql/sql_show.cc
Outdated
Column("Gtid_IO_Pos", Longtext(), NOT_NULL), | ||
Column("Replicate_Do_Domain_Ids", Longtext(), NOT_NULL), | ||
Column("Replicate_Ignore_Domain_Ids", Longtext(), NOT_NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we continue truncating those not-Name()
ones, just at a more lenient limit?
How about the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all probably ok as-is. I wouldn't expect these do/ignore_domain_ids
lists to ever be that long. Then for Gtid_IO_Pos
, each GTID is %u-%u-%llu
, so that'd be what, 10+1+10+1+20
=42
max-len each? At which point we could show 23 max-len GTIDs (which would be rare), but say the more likely length of a GTID would be 20
chars long, leading to ~50 being shown. I haven't heard of such configurations, but I suppose if someone really knew what they were doing (and potentially using domain ids specifically to enhance replication performance via parallelization), they could hit that.. I guess I'm ok either way for Gtid_IO_Pos
.
Column("Last_Errno", SLong(4), NOT_NULL), | ||
Column("Last_Error", Varchar(20), NULLABLE), | ||
Column("Last_Error", Varchar(MAX_SLAVE_ERRMSG), NULLABLE), | ||
Column("Skip_Counter", ULong(10), NOT_NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relay_log_info::slave_skip_counter
is a ulonglong
, but
Line 2984 in 74331a4
(*field++)->store((uint32) mi->rli.slave_skip_counter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sql_slave_skip_counter
system variable is max UINT_MAX
:
static Sys_var_multi_source_ulonglong Sys_slave_skip_counter(
"sql_slave_skip_counter", "Skip the next N events from the master log",
SESSION_VAR(slave_skip_counter), NO_CMD_LINE,
&Master_info::get_slave_skip_counter,
VALID_RANGE(0, UINT_MAX), DEFAULT(0), BLOCK_SIZE(1),
ON_UPDATE(update_slave_skip_counter));
Perhaps the variable type itself should be fixed (Relay_log_info::slave_skip_counter
and system_variables::slave_skip_counter
). I don't think we'd need to do that in any GA release (better not to touch it if we don't need to), but I'd think we can do it here (11.7
) or in the next release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, another one similar to MDEV-35706 Shrink my_thread_id to uint32_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U
/Long(void)
and U
/Longlong(void)
already exists, but they use MY_INT32_NUM_DECIMAL_DIGITS
(11) and MY_INT32_NUM_DECIMAL_DIGITS
(21) respectively for both signed and unsigned, which doesn’t match sql/sql_show.cc
’s ULong(10)
and ULonglong(20)
.
09876543210987654321 width
18446744073709551616 1<<64
4294967296 1<<32
So, what excactly are those constant macros?
Column("Gtid_IO_Pos", Longtext(), NOT_NULL), | ||
Column("Replicate_Do_Domain_Ids", Longtext(), NOT_NULL), | ||
Column("Replicate_Ignore_Domain_Ids", Longtext(), NOT_NULL), | ||
Column("Parallel_Mode", Varchar(sizeof("conservative")-1), NOT_NULL), | ||
Column("SQL_Delay", ULong(10), NOT_NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm?
Line 3106 in 74331a4
(*field++)->store((uint32) mi->rli.get_sql_delay()); |
Not only
Relay_log_info::get_sql_delay()
returns a signed int
, but it’s also missing , true
; this would invoke store(double nr)
instead of store(longlong nr, bool unsigned_val)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the , true
According to rpl.rpl_delayed_slave
, SQL_Delay
’s unsigned in practice.
So changing int
to unsigned int
is not a bug but a clarifying refactor. I won’t include it here.
Same for checking whether each of the (uint32)
casts make sense. We could instead use more Field::store()
overloads – for the various integer sizes as well as separate handlers for unsigned ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ParadoxV5 , you have some great finds here! See my notes.
Also, can you add a new test which fails before your patch to show the problems, which then succeeds with your patch?
Then for the git commit message, for the title, SSS
is only common for us casually speaking on the replication team. Can you spell it out?
Column("Last_Errno", SLong(4), NOT_NULL), | ||
Column("Last_Error", Varchar(20), NULLABLE), | ||
Column("Last_Error", Varchar(MAX_SLAVE_ERRMSG), NULLABLE), | ||
Column("Skip_Counter", ULong(10), NOT_NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sql_slave_skip_counter
system variable is max UINT_MAX
:
static Sys_var_multi_source_ulonglong Sys_slave_skip_counter(
"sql_slave_skip_counter", "Skip the next N events from the master log",
SESSION_VAR(slave_skip_counter), NO_CMD_LINE,
&Master_info::get_slave_skip_counter,
VALID_RANGE(0, UINT_MAX), DEFAULT(0), BLOCK_SIZE(1),
ON_UPDATE(update_slave_skip_counter));
Perhaps the variable type itself should be fixed (Relay_log_info::slave_skip_counter
and system_variables::slave_skip_counter
). I don't think we'd need to do that in any GA release (better not to touch it if we don't need to), but I'd think we can do it here (11.7
) or in the next release.
sql/sql_show.cc
Outdated
Column("Gtid_IO_Pos", Longtext(), NOT_NULL), | ||
Column("Replicate_Do_Domain_Ids", Longtext(), NOT_NULL), | ||
Column("Replicate_Ignore_Domain_Ids", Longtext(), NOT_NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all probably ok as-is. I wouldn't expect these do/ignore_domain_ids
lists to ever be that long. Then for Gtid_IO_Pos
, each GTID is %u-%u-%llu
, so that'd be what, 10+1+10+1+20
=42
max-len each? At which point we could show 23 max-len GTIDs (which would be rare), but say the more likely length of a GTID would be 20
chars long, leading to ~50 being shown. I haven't heard of such configurations, but I suppose if someone really knew what they were doing (and potentially using domain ids specifically to enhance replication performance via parallelization), they could hit that.. I guess I'm ok either way for Gtid_IO_Pos
.
sql/sql_show.cc
Outdated
Column("Retried_transactions", ULong(10), NOT_NULL), | ||
Column("Max_relay_log_size", ULonglong(10), NOT_NULL), | ||
Column("Executed_log_entries", ULong(10), NOT_NULL), | ||
Column("Slave_received_heartbeats", ULong(10), NOT_NULL), | ||
Column("Slave_heartbeat_period", Float(703), NOT_NULL), // 3 decimals | ||
Column("Gtid_Slave_Pos", Varchar(FN_REFLEN), NOT_NULL), | ||
Column("Gtid_Slave_Pos", Longtext(), NOT_NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes here as for Gtid_IO_Pos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you add a new test which fails before your patch to show the problems, which then succeeds with your patch?
I wondered at “§ How can this PR be tested?”, which fields should we test and how long the strings should we assert?
Then for the git commit message, for the title,
SSS
is only common for us casually speaking on the replication team. Can you spell it out?
😄 When I picked up MDEV-35304, I too questioned what its description meant by ‘SSS’.
I thought it was a common jargon!
As for the title… With ‘SHOW REPLICA STATUS’ it doesn’t fit the 50-char guideline, but I suppose it’s fine omitting MDEV-35693:
from the count.
(It’s time to avoid calling things ‘SLAVE’s when we can.)
Column("Last_Errno", SLong(4), NOT_NULL), | ||
Column("Last_Error", Varchar(20), NULLABLE), | ||
Column("Last_Error", Varchar(MAX_SLAVE_ERRMSG), NULLABLE), | ||
Column("Skip_Counter", ULong(10), NOT_NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, another one similar to MDEV-35706 Shrink my_thread_id to uint32_t?
// utf8mb3 Varchars longer than MAX_FIELD_VARCHARLENGTH/3 become Longtexts | ||
DBUG_ASSERT(length * 3 <= MAX_FIELD_VARCHARLENGTH); | ||
} | ||
Varchar(): Type(&type_handler_varchar, MAX_FIELD_VARCHARLENGTH/3, false) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://mariadb.com/kb/en/unicode/ says 11.8 wants to change utf8
to utf8mb4
.
Assuming the default charset is simply ‘utf8’, I’ll let that work change these 3
s to 4
s, in hopes that it notices it creates unexpected #3772 (comment) changes too.
master_last_event_time_row.test should provide a good example to use on how to query/assert the correctness of
And then you can use it in queries using
Then to
Hmm, I didn't realize we had a 50-char title maximum - I probably break that rule often 😅. I'd say not to omit the MDEV-35693 part, that's the most important part. Maybe try to summarize as best you can within 50 chars, and if you can't, then just use the JIRA title and truncate it after 50. Also, in the git commit message, the high-level description of the problem was dropped since your last commit:
Such descriptions are good to say up-front to summarize the issue for non-programmers, who often read through our commit messages (I also think we should be explicit about the length where the truncation happens, so I'd say to add that they'd truncate at |
So, just testing those was-
The mention is still there but yes, the emphasis on how it’s a problem was lost. 👍 |
def information_schema SLAVE_STATUS Replicate_Do_DB 15 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO | ||
def information_schema SLAVE_STATUS Replicate_Do_Domain_Ids 47 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO | ||
def information_schema SLAVE_STATUS Replicate_Do_Table 17 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO | ||
def information_schema SLAVE_STATUS Replicate_Ignore_DB 16 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO | ||
def information_schema SLAVE_STATUS Replicate_Ignore_Domain_Ids 48 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO | ||
def information_schema SLAVE_STATUS Replicate_Ignore_Server_Ids 41 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO | ||
def information_schema SLAVE_STATUS Replicate_Ignore_Table 18 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO | ||
def information_schema SLAVE_STATUS Replicate_Rewrite_DB 56 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO | ||
def information_schema SLAVE_STATUS Replicate_Wild_Do_Table 19 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO | ||
def information_schema SLAVE_STATUS Replicate_Wild_Ignore_Table 20 NULL NO varchar 21844 65532 NULL NULL NULL utf8mb3 utf8mb3_general_ci varchar(21844) select NEVER NULL NO NO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually @bnestere do these count as tests for this issue?
Resize SHOW REPLICA STATUS’s (technically `INFORMATION_SCHEMA.` `SLAVE_STATUS`’s) columns to better match their possible values In case of intentionally but absurdly long lists, text columns that list an uncapped number of elements have expanded to accept as many bytes as we could support. Particularly, the first-gen `Replicate_` filters were incorrectly typed as singlular `Name()`s during MDEV-33526. Under `Name`s’ 64-char limit, they could overflow (read: truncate) even before their lengths got absurd. In response to `‘MAX_SLAVE_ERRMSG’ was not declared in this scope` in Embedded builds, a new `#ifdef HAVE_REPLICATION` guard wraps `slave_status_info` to skip this unused data in Replication-less builds. Reviewed-by: Brandon Nesterenko <[email protected]>
Update all integer columns of SHOW REPLICA STATUS (technically INFORMATION_SCHEMA.SLAVE_STATUS) to unsigned because, well, they are (:. Some `uint32` ones were accidentally using the `Field::store(double nr)` overload because they forgot the `, true` for `Field::store(longlong nr, bool unsigned_val)`. The mistake’s harmless, fortunately, as `double` supports over 15 significant decimal digits, well over `uint32`’s 9-and-a-half.
Uhh… I don’t feel right testing only a subset. |
Description
Resize SHOW REPLICA STATUS’s (technically
INFORMATION_SCHEMA.SLAVE_STATUS
’s) columns to better match their possible valuesIn case of intentionally but absurdly long lists, text columns that list an uncapped number of elements have expanded to accept as many bytes as we could support.
While here, I found all our integer columns are unsigned, so I updated ones that weren’t configured as unsigned before for both column types and the field storing function (grep
\(\*field\+\+\)->store\(.+(?<!false|true|my_charset_bin)\);
minus the one(double)
).In response to
‘MAX_SLAVE_ERRMSG’ was not declared in this scope
in Embedded builds, a new#ifdef HAVE_REPLICATION
guard wrapsslave_status_info
to skip this unused data in Replication-less builds.What problem is the patch trying to solve?
Particularly, the first-gen
Replicate_
filters were incorrectly typed as singlularName()
s during MDEV-33526.Under
Name
s’ 64-char limit, they could overflow (read: truncate) even before their lengths got absurd.Do you think this patch might introduce side-effects in other parts of the server?
Let me know if resizing IS column types is a significant backward-incompatibility.
Release Notes
Increased VARCHAR width limits of
SHOW SLAVE STATUS
so it doesn’t truncate long lists that should be long.Optionally list any https://mariadb.com/kb/ pages that need changing.
By the way, these three columns are missing on https://mariadb.com/kb/en/show-replica-status/#column-descriptions:
How can this PR be tested?
🤔 How shall we test this? Cook up a replica so its status report has all columns filled to the brim?
Sample manual test from MDEV-35693
Of course, the original issue about
Replicate_
fields specifically is reproducible by setting them to anything longer than 64 bytes.Unexpected:
Replicate_Do_Table: db_to_filter.0123456789_7,db_to_filter.0123456789_1,db_to_filter
Basing the PR against the correct MariaDB version
main
branch.PR quality check