Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oraclelogminer: added in support for LOB replication in the LogToKV layer #1103

Open
wants to merge 1 commit into
base: oracle-source-0826
Choose a base branch
from

Conversation

ryanluu12345
Copy link
Contributor

@ryanluu12345 ryanluu12345 commented Jan 1, 2025

Previously, LOB data would cause a crash because of antlr parsing oddities. The logic here aims to do a few things:

  1. Update the insert and update queries that have EMPTY_CLOB or EMPTY_BLOB function calls to insert empty strings instead
  2. Convert the hextoraw(...) data to actual bytea strings that can be interpreted in a CRDB query
  3. Determine which items in the KV map for set and where clauses need to be updated with these parsing rules

Resolves: CC-31048
Release Note: None


This change is Reviewable

…ayer

Previously, LOB data would cause a crash because of antlr parsing oddities. The logic here aims to do a few things:
1. Update the insert and update queries that have EMPTY_CLOB or EMPTY_BLOB function calls to insert empty strings instead
2. Convert the hextoraw(...) data to actual bytea strings that can be interpreted in a CRDB query
3. Determine which items in the KV map for set and where clauses need to be updated with these parsing rules

Resolves: CC-31048
Release Note: None
func replaceEmptyLobsWithEmptyString(input string) string {
return emptyLobRegex.ReplaceAllString(input, "''")
}

// LogToKV parse a sql stmt log to a SetKV struct, where the key is the column to be rewritten, the value
// is the value to override / insert. The true extraction logic can be found in the functions related
// to oracleparser.MockListener.
// Examples can be found in TestLogToKV().
func LogToKV(log string) (oracleparser.SetAndWhereKVStructs, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: take out all the verbose logging

input: `insert into "C##MYADMIN"."LOB_TABLE"
("x","y","BLOB_COL","RAW_COL","LONG_RAW_COL","CLOB_COL","NCLOB_COL")
values
('9','10115','',HEXTORAW('52415731'),NULL,'','');`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For insert or update need to test out the latest LOB_TABLE and see why RAW is not being updated properly in the first INSERT.

@@ -32,6 +33,66 @@ type logToKVTestCase struct {

func TestLogToKV(t *testing.T) {
for i, tc := range []logToKVTestCase{
{
input: `insert into "C##MYADMIN"."LOB_TABLE"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to figure out cases where there are non LOB cols and LOB cols but its majority non LOB.

@ryanluu12345
Copy link
Contributor Author

@ZhouXing19 also realized that I just based this off your latest PR branch, but that has the WIP for the new parsing scheme huh? I'm seeing certain bugs here which makes sense still it's still a WIP.

By the way, this change works fine with the branch version of oracle-source-0826 like two months ago. But we'll need to reevaluate this once the new method is in. It's not properly replicating data over.

@ryanluu12345
Copy link
Contributor Author

ryanluu12345 commented Jan 1, 2025

So I dug into this behavior a bit more closely and notice a bug with how we are doing updates for the LOB case. As previously mentioned for lobs:

  1. Insert occurs with non-LOB columns
  2. Update occurs with LOB columns for the record from step 1
insert into "C##MYADMIN"."LOB_EASY"("x","y","RAW_COL","BLOB_COL") values ('169800','1800',HEXTORAW('52415731'),'');

update "C##MYADMIN"."LOB_EASY" set "BLOB_COL" = HEXTORAW('343836353643364336463230344637323631363336433635') where "x" = '169800' and "y" = '1800' and "RAW_COL" = HEXTORAW('52415731') and ROWID = 'AAASMfAAHAAAALOAAX';

What we actually see here however is that the update seems to clobber what happened in the insert. So from the above, we fully expect that every field will be filled out like how it is in the source:

        x       y RAW_COL     BLOB_COL                                            
_________ _______ ___________ ___________________________________________________   
   169800    1800 52415731    343836353643364336463230344637323631363336433635 

However, if you look closer here, you'll see on the target:

root@localhost:26257/defaultdb> SELECT * FROM "LOB_EASY";                     
    x    |  y   |                  RAW_COL                   |                      blob_col
---------+------+--------------------------------------------+-----------------------------------------------------
  169800 | NULL | NULL                                       | \x343836353643364336463230344637323631363336433635

In this case, any non PK element is basically set as NULL. This seems to imply that the update behavior is off here. Instead of hydrating with existing data in the target table, it just upserts NULL.

Looking closer at toDBType, we see that there are actually values for the non-LOB values in the update. However, they are null values:

"x"
169802
"RAW_COL"
<nil>
"blob_col"
\x343836353643364336463230344637323631363336433635
"y"
<nil>

Question here is: how can we differentiate between what is truly null vs. what is just not set, but the intention is to set only that thing and not touch anything else?

So after digging into a comment that @ZhouXing19 left about how we need to put the PK from the "WhereKV" into the "SetKV", I realized that a similar principle applies to these updates that come in from the LogMiner redo log. All the values for that column are fully defined in that WHERE statement. Although there is the caveat that we want to exclude ROWID, because a target CRDB will most likely not have a ROWID target. So the fix here is to put all other keys in the WhereKV to SetKV. But we need to think about cases where this breaks down. (First cut but not good enough, look below for final code here)

for key, val := range kv.WhereKV {
						// Skip ROWID since that's not relevant on the
						// target schema. If something is of a ROWID
						// type, it would be named differently too technically.
						if key == "ROWID" {
							continue
						}

						kv.SetKV[key] = val
					}

Good News

  • Single item updates work fine: UPDATE "LOB_EASY" SET "y" = 11 WHERE "x" = 169803; COMMIT;
  169803 |   11 | NULL                                       | \x343836353643364336463230344637323631363336433635
  169805 |   11 | \x52415731                                 | \x343836353643364336463230344637323631363336433635
  • Multiple item updates seem to be working fine too
SQL> UPDATE "LOB_EASY" SET "y" = 10 WHERE "x" > 1; COMMIT;

28 rows updated.


Commit complete.
...
    x    | y  |                  RAW_COL                   |                      blob_col
---------+----+--------------------------------------------+-----------------------------------------------------
       1 | 11 | \x                                         | NULL
       2 | 11 | \x484558544f5241572827616263642729         | NULL
       3 | 11 | \x484558544f5241572827616263642729         | NULL
       4 | 11 | \x484558544f52415728273061323362642729     | NULL
       8 | 11 | \x484558544f524157282735323431353733312729 | NULL
       9 | 11 | \x484558544f524157282735323431353733312729 | NULL
      10 | 11 | \x484558544f524157282735323431353733312729 | NULL
      11 | 11 | NULL                                       | NULL
      12 | 11 | NULL                                       | NULL
      15 | 11 | NULL                                       | NULL
      16 | 11 | NULL                                       | NULL
     167 | 11 | \x484558544f524157282735323431353733312729 | NULL
     168 | 11 | NULL                                       | NULL
     169 | 11 | \x52415731                                 | NULL
    1690 | 11 | NULL                                       | NULL
    1691 | 11 | NULL                                       | NULL
    1692 | 11 | NULL                                       | NULL
    1694 | 11 | NULL                                       | NULL
    1695 | 11 | NULL                                       | NULL
    1696 | 11 | NULL                                       | NULL
    1697 | 11 | NULL                                       | NULL
    1698 | 11 | \x52415731                                 | NULL
   16967 | 11 | NULL                                       | NULL
  169800 | 11 | NULL                                       | \x343836353643364336463230344637323631363336433635
  169801 | 11 | NULL                                       | \x343836353643364336463230344637323631363336433635
  169802 | 11 | NULL                                       | \x343836353643364336463230344637323631363336433635
  169803 | 11 | NULL                                       | \x343836353643364336463230344637323631363336433635
  169804 | 11 | NULL                                       | NULL
  169805 | 11 | \x52415731                                 | \x343836353643364336463230344637323631363336433635

So this new methodology seems to work fine. We should put everything but ROWID inside.

Actually, there is a caveat here, we can't let the key from the WHERE take precedence over the WHERE inside of the set. So if SetKV already has the key, then ignore the key from WHERE KV. Then I think the behavior should be fine. So basically precedence order is: Set val > Where val > default val when things get reified.

Final code that works in: internal/source/oraclelogminer/conn.go inside of outputMessage

// TODO(janexing): consider changefeed log where the PK is updated.
					for key, val := range kv.WhereKV {
						// Skip ROWID since that's not relevant on the
						// target schema. If something is of a ROWID
						// type, it would be named differently too
						// technically.

						// Skip items that are already present in the
						// SetKV since that is the latest value we want
						// to switch to.
						if _, ok := kv.SetKV[key]; key == "ROWID" || ok {
							continue
						}

						kv.SetKV[key] = val
					}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant