From e56d843abbc823f052b9a7dc2f24933e2d6650b7 Mon Sep 17 00:00:00 2001 From: fivetran-abdulsalam Date: Tue, 18 Jun 2024 15:57:26 +0530 Subject: [PATCH 01/13] added history mode guide --- development-guide.md | 6 +- how-to-handle-history-mode-files.md | 220 ++++++++++++++++++++++++++++ 2 files changed, 224 insertions(+), 2 deletions(-) create mode 100644 how-to-handle-history-mode-files.md diff --git a/development-guide.md b/development-guide.md index b5bf80c..10e1606 100644 --- a/development-guide.md +++ b/development-guide.md @@ -105,12 +105,14 @@ This operation should report all columns in the destination table, including Fiv - This operation might be requested for a table that does not exist in the destination. In that case, it should NOT fail, simply ignore the request and return `success = true`. - `utc_delete_before` has millisecond precision. -#### WriteBatchRequest +#### WriteBatchRequest - `replace_files` is for `upsert` operation where the rows should be inserted if they don't exist or updated if they do. Each row will always provide values for all columns. Set the `_fivetran_synced` column in the destination with the values coming in from the csv files. - `update_files` is for `update` operation where modified columns have actual values whereas unmodified columns have the special value `unmodified_string` in `CsvFileParams`. Soft-deleted rows will arrive in here as well. Update the `_fivetran_synced` column in the destination with the values coming in from the csv files. -- `delete_files` is for `hard delete` operation. Use primary key columns (or `_fivetran_id` system column for primary-keyless tables) to perform `DELETE FROM`. +- `delete_files` is for `hard delete` operation. Use primary key columns (or `_fivetran_id` system column for primary-keyless tables) to perform `DELETE FROM`. + +Note: To handle history mode `replace_files` , `update_files` and `delete_files`. Follow [How to Handle History Mode Data](how-to-handle-history-mode-files.md) guide Also, Fivetran will deduplicate operations such that each primary key will show up only once in any of the operations diff --git a/how-to-handle-history-mode-files.md b/how-to-handle-history-mode-files.md new file mode 100644 index 0000000..f3f4ce5 --- /dev/null +++ b/how-to-handle-history-mode-files.md @@ -0,0 +1,220 @@ +#### What is History Mode + +History mode allows us to capture every version of each record processed by the fivetran connectors. +In order to keep all versions of the record, we have introduced three new system columns for tables with history mode enabled. + + +Column | Type | Description +--- | --- | --- +_fivetran_active | Boolean | TRUE if it is the currently active record. FALSE if it is a historical version of the record. Only one version of the record can be TRUE. +_fivatran_start | TimeStamp | The time when the record was first created or modified in the source. +_fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active(`_fivetran_active`=FALSE), then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of delete operation. If the record is active(`_fivetran_active`=TRUE), then `_fivetran_end` is the max allowed value that we can set for a TIMESTAMP column. + + +#### Points to remember in history mode + +- In WriterBatchRequest we pass a new optional field HistoryMode which specifies if current connector mode is history mode or not. In this HistoryMode field, we pass `deleted_column` column name which we need to modify only if it is present in destination. +- If the existing table is not empty then in the batch file we also send a boolean column `_fivetran_earliest`. Suppose in an `upsert` we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE. +- For each Replace, Update and Delete batch files, DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(Refer Replace example 1 and example 2). + +Note: This `_fivetran_earliest` column should never be added in the destination table. We introduced this column to easily identify the earliest record and can be used to optimize data loads query. +Below is an example of `replaqce_file` + +Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest +---|---------|---------------------| --- |------------------| --- +1 | abc | T1 | T2-1 | FALSE | TRUE +2 | xyz | T1 | TMAX | TRUE | TRUE +1 | pqr | T2 | T3-1 | FALSE | FALSE +1 | def | T3 | TMAX | TRUE | FALSE + +#### How to Handle Replaces, Updates and Deletes + +##### Replace + +###### Example 1: + +When `_fivetran_start` of destination table is less than `_fivetran_start` of batch file. +Suppose the existing Table in destination is as below: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced +--- |------|----| --- | --- | --- | --- +1 | abc | 1 |T1 | T2-1 | FALSE | T100 +1 | pqr | 2 | T2 | TMAX | TRUE | T101 +2 | mno | 3 | T2 | TMAX | TRUE | T103 + +At source new records are added: + +Id(PK) | COL1 | COL2 | Timestamp | Type +--- | --- | --- |-----------| --- +1 | def |1 | T3 | Inserted +1 | ghi | 1 | T4 | Inserted + +Replace batch file will be: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivatran_earliest | _fivetran_synced +--- |------|-------|---------------------| --- | --- | --- | --- +1 | def | 1 | T3 | T4-1 | FALSE | TRUE | T104 +1 | ghi | 1| T4 | TMAX | TRUE | FALSE | T105 + + +Final Destination Table will be: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced +--- |---|--------|---------------------| --- |------------------| --- +1 | abc | 1 | T1 | T2-1 | FALSE | T100 +1 | pqr | 2 | T2 | T3-1 | FALSE | T101 +2 | mno | 3 | T3 | TMAX | TRUE | T103 +1 | def | 1 |T3 | T4-1 | FALSE | T104 +1 | ghi | 1 | T4 | TMAX | TRUE | T105 + +**Explanation:** +- We got new records for id = 1. +- Check for corresponding earliest record(`_fivetran_earliest` as TRUE), DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(In above example no) +- `_fivetran_end` of the active record in destination table is set to `_fivatran_start`-1 of the `_fivatran_earliest` record of batch file. +- Set `_fivetran_active` for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE +- New records are inserted AS IS excluding `_fivetran_earliest` column in destination table. + +###### Example 2 + +When `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file. +Suppose the existing Table in destination is as below: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced +--- |---|--------|---------------------| --- |------------------| --- +1 | xyz | 4 | T1 | T3-1 | FALSE | T100 +1 | abc | 1 | T3 | T4-1 | FALSE | T100 +1 | pqr | 2 | T4 | TMAX | TRUE | T101 +2 | mno | 3 | T4 | TMAX | TRUE | T103 + +At source new records are added: + +Id(PK) | COL1 | COL2 | Timestamp | Type +--- | --- | --- | --- | --- +1 | ghi | 1 | T2 | Inserted + + + +Replace batch file will be: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivatran_earliest | _fivetran_synced +--- | --- | --- | --- | --- | --- | --- | --- +1 | ghi | 1 | T2 | TMAX | TRUE | TRUE | T104 + +Final Destination table will be: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced +--- | --- | --- | --- | --- | --- | --- +1 | ghi | 1 | T2 | TMAX | TRUE | T104 +1 | xyz | 4 | T1 | T3-1 | FALSE | T100 +2 | mno | 3 | T4 | TMAX | TRUE | T103 + +**Explanation:** +We got new records for id = 1. +- Check for corresponding earliest record(`_fivetran_earliest` TRUE), DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(in above example yes, so deleted id = 1 with _fivetran_start = T3 and T4) +- `_fivetran_end` of the active record in destination table is set to `_fivatran_start`-1 of the `_fivatran_earliest` record of batch file. +- Set `_fivetran_active` for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE +- New records are inserted AS IS excluding `_fivetran_earliest` column in destination table. + +##### Updates + +Suppose the existing Table in destination is: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced +--- | --- | --- | --- | --- | --- | --- +1 | abc | 1 | T1 | T2-1 | FALSE | T100 +1 | pqr | 2 | T2 | TMAX | TRUE | T101 +2 | mno | 3 | T2 | TMAX | TRUE | T103 + + +At source records with Id = 1 is updated: + +Id(PK) | COL1 | Timestamp | Type +--- | --- | --- | --- +1 | xyz | T3 | Updated + + + +And record with id = 2 is updated as: + +Id(PK) | COL2 | Timestamp | Type +--- | --- | --- | --- +2 | 1000 | T4 | Updated + +And record with Id = 1 is again updated as + +Id(PK) | COL1 | Timestamp | Type +--- | --- | --- | --- +1 | def | T5 | Updated + + + +Update batch file will be: + + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivatran_earliest | _fivatran_synced +--- | --- | --- | --- | --- | --- | --- | --- +1 | xyz | | T3| T5-1 | FALSE | TRUE | T107 +2 | | 1000 | T4 | TMAX | TRUE | TRUE | T108 +1 | def | | T5 | TMAX | TRUE | FALSE | T109 + + +Final Destination Table will be: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced +--- | --- | --- | --- | --- | --- | --- +1 | abc | 1 | T1 | T2-1 | FALSE | T100 +1 | pqr | 2 | T2 | T3-1 | FALSE | T101 +2 | mno | 3 | T2 | T4-1 | FALSE | T103 +1 | def | 2 | T5 | TMAX | TRUE | T109 +1 | xyz | 2 | T3 | T5-1 | FALSE | T107 +2 | mno | 1000 | T4 | TMAX | TRUE | T108 + + + +**Explanation:** + - In batch file we got records with id = 1 and id = 2. +- We set other columns(non updated columns) to the values of the active records. In above case for id = 2, we didn’t get COL1 value, so we set COL1 to “mno”(COL1 value of the active record) + - _fivetran_end of the active record in destination table is set to _fivatran_start-1 of the _fivatran_earliest record of batch file +- Set _fivetran_active for above updated record to FALSE and deleted_column(if present in destination table) to TRUE +- Other columns are set AS IS from the batch file in the destination table except _fivetran_earliest column. + + +##### Deletes + +Existing Table in destination: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced +--- | --- | --- | --- | --- | --- | --- +1 | abc | 1 | T1 | T2-1 | FALSE | T100 +1 | pqr | 2 | T2 | TMAX | TRUE | T101 +2 | mno | 3 | T2 | TMAX | TRUE | T103 + + + +At source a record is deleted: + + +Id(PK) | Timestamp | Type +--- | --- | --- +1 | T3 | Deleted + + +Delete batch file will be: + +Id(PK) | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced +--- | --- |---------------|------| --- | --- +1 | | T3-1 | | TRUE | T104 + + +Final Destination Table will be: + +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced +--- | --- | --- | --- | --- |------------------| --- +1 | abc | 1 | T1 | T2-1 | FALSE | T100 +1 | pqr | 2 | T2 | T3-1 | FALSE | T101 +2 | mno | 3 | T2 | TMAX | TRUE | T103 + +**Explanation:** +Set `_fivetran_active` to FALSE for the active record and set `_fivetran_end` = T3-1 and `deleted_column`(if present in destination) to TRUE + + From 34e7146dc45a5530a0b9b253efe19446e84ff4d6 Mon Sep 17 00:00:00 2001 From: fivetran-abdulsalam Date: Tue, 18 Jun 2024 17:07:55 +0530 Subject: [PATCH 02/13] doc grammar update --- how-to-handle-history-mode-files.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/how-to-handle-history-mode-files.md b/how-to-handle-history-mode-files.md index f3f4ce5..329014f 100644 --- a/how-to-handle-history-mode-files.md +++ b/how-to-handle-history-mode-files.md @@ -13,12 +13,12 @@ _fivetran_end | TimeStamp | The value for this column depends on whether the rec #### Points to remember in history mode -- In WriterBatchRequest we pass a new optional field HistoryMode which specifies if current connector mode is history mode or not. In this HistoryMode field, we pass `deleted_column` column name which we need to modify only if it is present in destination. +- In WriterBatchRequest we pass a new optional field HistoryMode which indicates connector is in history mode or not. In this HistoryMode field, we pass `deleted_column` column name which we need to modify only if it is present in destination. - If the existing table is not empty then in the batch file we also send a boolean column `_fivetran_earliest`. Suppose in an `upsert` we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE. - For each Replace, Update and Delete batch files, DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(Refer Replace example 1 and example 2). -Note: This `_fivetran_earliest` column should never be added in the destination table. We introduced this column to easily identify the earliest record and can be used to optimize data loads query. -Below is an example of `replaqce_file` +Note: The `_fivetran_earliest` column shouldn't be added in the destination table. It is introduced to easily identify the earliest record and can be used to optimize data loads query. +Below is an example of `replace_file` Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest ---|---------|---------------------| --- |------------------| --- From 1a5084f83ff1181e4ab8f111a0d26490c3d430ce Mon Sep 17 00:00:00 2001 From: Alex Ilyichov Date: Tue, 18 Jun 2024 14:53:39 +0200 Subject: [PATCH 03/13] Update how-to-handle-history-mode-files.md --- how-to-handle-history-mode-files.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/how-to-handle-history-mode-files.md b/how-to-handle-history-mode-files.md index 329014f..b30479f 100644 --- a/how-to-handle-history-mode-files.md +++ b/how-to-handle-history-mode-files.md @@ -7,7 +7,7 @@ In order to keep all versions of the record, we have introduced three new system Column | Type | Description --- | --- | --- _fivetran_active | Boolean | TRUE if it is the currently active record. FALSE if it is a historical version of the record. Only one version of the record can be TRUE. -_fivatran_start | TimeStamp | The time when the record was first created or modified in the source. +_fivetran_start | TimeStamp | The time when the record was first created or modified in the source. _fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active(`_fivetran_active`=FALSE), then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of delete operation. If the record is active(`_fivetran_active`=TRUE), then `_fivetran_end` is the max allowed value that we can set for a TIMESTAMP column. @@ -51,7 +51,7 @@ Id(PK) | COL1 | COL2 | Timestamp | Type Replace batch file will be: -Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivatran_earliest | _fivetran_synced +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced --- |------|-------|---------------------| --- | --- | --- | --- 1 | def | 1 | T3 | T4-1 | FALSE | TRUE | T104 1 | ghi | 1| T4 | TMAX | TRUE | FALSE | T105 @@ -70,7 +70,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - We got new records for id = 1. - Check for corresponding earliest record(`_fivetran_earliest` as TRUE), DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(In above example no) -- `_fivetran_end` of the active record in destination table is set to `_fivatran_start`-1 of the `_fivatran_earliest` record of batch file. +- `_fivetran_end` of the active record in destination table is set to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. - Set `_fivetran_active` for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE - New records are inserted AS IS excluding `_fivetran_earliest` column in destination table. @@ -96,7 +96,7 @@ Id(PK) | COL1 | COL2 | Timestamp | Type Replace batch file will be: -Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivatran_earliest | _fivetran_synced +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced --- | --- | --- | --- | --- | --- | --- | --- 1 | ghi | 1 | T2 | TMAX | TRUE | TRUE | T104 @@ -111,7 +111,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** We got new records for id = 1. - Check for corresponding earliest record(`_fivetran_earliest` TRUE), DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(in above example yes, so deleted id = 1 with _fivetran_start = T3 and T4) -- `_fivetran_end` of the active record in destination table is set to `_fivatran_start`-1 of the `_fivatran_earliest` record of batch file. +- `_fivetran_end` of the active record in destination table is set to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. - Set `_fivetran_active` for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE - New records are inserted AS IS excluding `_fivetran_earliest` column in destination table. @@ -151,7 +151,7 @@ Id(PK) | COL1 | Timestamp | Type Update batch file will be: -Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivatran_earliest | _fivatran_synced +Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced --- | --- | --- | --- | --- | --- | --- | --- 1 | xyz | | T3| T5-1 | FALSE | TRUE | T107 2 | | 1000 | T4 | TMAX | TRUE | TRUE | T108 @@ -174,7 +174,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - In batch file we got records with id = 1 and id = 2. - We set other columns(non updated columns) to the values of the active records. In above case for id = 2, we didn’t get COL1 value, so we set COL1 to “mno”(COL1 value of the active record) - - _fivetran_end of the active record in destination table is set to _fivatran_start-1 of the _fivatran_earliest record of batch file + - _fivetran_end of the active record in destination table is set to _fivetran_start-1 of the _fivetran_earliest record of batch file - Set _fivetran_active for above updated record to FALSE and deleted_column(if present in destination table) to TRUE - Other columns are set AS IS from the batch file in the destination table except _fivetran_earliest column. From 20139d32cf53791ea85fe830f15308af10d67841 Mon Sep 17 00:00:00 2001 From: fivetran-abdulsalam Date: Tue, 18 Jun 2024 19:21:13 +0530 Subject: [PATCH 04/13] incorporate tech writer changes --- development-guide.md | 2 +- how-to-handle-history-mode-files.md | 68 ++++++++++++++--------------- 2 files changed, 35 insertions(+), 35 deletions(-) diff --git a/development-guide.md b/development-guide.md index 10e1606..32900fb 100644 --- a/development-guide.md +++ b/development-guide.md @@ -112,7 +112,7 @@ This operation should report all columns in the destination table, including Fiv - `delete_files` is for `hard delete` operation. Use primary key columns (or `_fivetran_id` system column for primary-keyless tables) to perform `DELETE FROM`. -Note: To handle history mode `replace_files` , `update_files` and `delete_files`. Follow [How to Handle History Mode Data](how-to-handle-history-mode-files.md) guide +> Note: To handle history mode `replace_files`, `update_files` and `delete_files`, follow [How to Handle History Mode Data](how-to-handle-history-mode-files.md) guide Also, Fivetran will deduplicate operations such that each primary key will show up only once in any of the operations diff --git a/how-to-handle-history-mode-files.md b/how-to-handle-history-mode-files.md index b30479f..2e4143b 100644 --- a/how-to-handle-history-mode-files.md +++ b/how-to-handle-history-mode-files.md @@ -1,6 +1,6 @@ #### What is History Mode -History mode allows us to capture every version of each record processed by the fivetran connectors. +History mode allows us to capture every version of each record processed by the Fivetran connectors. In order to keep all versions of the record, we have introduced three new system columns for tables with history mode enabled. @@ -8,16 +8,16 @@ Column | Type | Description --- | --- | --- _fivetran_active | Boolean | TRUE if it is the currently active record. FALSE if it is a historical version of the record. Only one version of the record can be TRUE. _fivetran_start | TimeStamp | The time when the record was first created or modified in the source. -_fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active(`_fivetran_active`=FALSE), then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of delete operation. If the record is active(`_fivetran_active`=TRUE), then `_fivetran_end` is the max allowed value that we can set for a TIMESTAMP column. +_fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active(`_fivetran_active`=FALSE), then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of delete operation. If the record is active (`_fivetran_active`=TRUE), then `_fivetran_end` is the maximal allowed value that we can set for a TIMESTAMP column. #### Points to remember in history mode -- In WriterBatchRequest we pass a new optional field HistoryMode which indicates connector is in history mode or not. In this HistoryMode field, we pass `deleted_column` column name which we need to modify only if it is present in destination. -- If the existing table is not empty then in the batch file we also send a boolean column `_fivetran_earliest`. Suppose in an `upsert` we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE. -- For each Replace, Update and Delete batch files, DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(Refer Replace example 1 and example 2). +- In `WriterBatchRequest`, we pass a new optional field HistoryMode which indicates connector is in history mode or not. In this HistoryMode field, we pass `deleted_column` column name which we need to modify only if it is present in destination. +- If the existing table is not empty, then, in the batch file, we also send a boolean column `_fivetran_earliest`. Suppose, in an `upsert`, we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` column value to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE. +- For each Replace, Update and Delete batch files, DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(refer to [Example 1](#example1) and [Example 2](#example2)). -Note: The `_fivetran_earliest` column shouldn't be added in the destination table. It is introduced to easily identify the earliest record and can be used to optimize data loads query. +NOTE: The `_fivetran_earliest` column shouldn't be added in the destination table. It is introduced to easily identify the earliest record and can be used to optimize the data loads query. Below is an example of `replace_file` Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest @@ -33,7 +33,7 @@ Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fiv ###### Example 1: -When `_fivetran_start` of destination table is less than `_fivetran_start` of batch file. +When the `_fivetran_start` column value of destination table is less than `_fivetran_start` of batch file. Suppose the existing Table in destination is as below: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced @@ -42,14 +42,14 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 1 | pqr | 2 | T2 | TMAX | TRUE | T101 2 | mno | 3 | T2 | TMAX | TRUE | T103 -At source new records are added: +At source, new records are added: Id(PK) | COL1 | COL2 | Timestamp | Type --- | --- | --- |-----------| --- 1 | def |1 | T3 | Inserted 1 | ghi | 1 | T4 | Inserted -Replace batch file will be: +The Replace batch file will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced --- |------|-------|---------------------| --- | --- | --- | --- @@ -57,7 +57,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 1 | ghi | 1| T4 | TMAX | TRUE | FALSE | T105 -Final Destination Table will be: +The final destination table will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- |---|--------|---------------------| --- |------------------| --- @@ -69,15 +69,15 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - We got new records for id = 1. -- Check for corresponding earliest record(`_fivetran_earliest` as TRUE), DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(In above example no) -- `_fivetran_end` of the active record in destination table is set to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. -- Set `_fivetran_active` for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE -- New records are inserted AS IS excluding `_fivetran_earliest` column in destination table. +- We check for the corresponding earliest record (`_fivetran_earliest` as TRUE), and deleted the existing records from the destination table where the `_fivetran_start` column value in the destination table is greater than or equal to `_fivetran_start` of batch file. +- We set `_fivetran_end` of the active record in destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. +- We set the `_fivetran_active` column value for the above updated record to FALSE and `deleted_column` (if present in the destination table) to TRUE. +- We inserted new records _as is_ excluding the `_fivetran_earliest` column in the destination table. ###### Example 2 -When `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file. -Suppose the existing Table in destination is as below: +When the `_fivetran_start` column value in the destination table is greater than or equal to the `_fivetran_start` in the batch file. +Suppose the existing table in destination is as below: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- |---|--------|---------------------| --- |------------------| --- @@ -86,7 +86,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 1 | pqr | 2 | T4 | TMAX | TRUE | T101 2 | mno | 3 | T4 | TMAX | TRUE | T103 -At source new records are added: +At the source new records are added: Id(PK) | COL1 | COL2 | Timestamp | Type --- | --- | --- | --- | --- @@ -94,13 +94,13 @@ Id(PK) | COL1 | COL2 | Timestamp | Type -Replace batch file will be: +The replace batch file will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced --- | --- | --- | --- | --- | --- | --- | --- 1 | ghi | 1 | T2 | TMAX | TRUE | TRUE | T104 -Final Destination table will be: +Final Destination table will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- | --- | --- @@ -109,15 +109,15 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active 2 | mno | 3 | T4 | TMAX | TRUE | T103 **Explanation:** -We got new records for id = 1. -- Check for corresponding earliest record(`_fivetran_earliest` TRUE), DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(in above example yes, so deleted id = 1 with _fivetran_start = T3 and T4) +- We got new records for id = 1. +- We check for the corresponding earliest record(`_fivetran_earliest` as TRUE), and deleted the existing records from the destination table if `_fivetran_start` of destination table is greater than or equal to the `_fivetran_start` of the batch file(in the above example we have such records, so deleted records with id = 1 and _fivetran_start = T3 and T4) - `_fivetran_end` of the active record in destination table is set to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. -- Set `_fivetran_active` for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE -- New records are inserted AS IS excluding `_fivetran_earliest` column in destination table. +- We set the `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE +- We inserted new records _as is_ excluding the `_fivetran_earliest` column in the destination table. ##### Updates -Suppose the existing Table in destination is: +Suppose the existing Table in destination is as below: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- | --- | --- @@ -148,7 +148,7 @@ Id(PK) | COL1 | Timestamp | Type -Update batch file will be: +Update batch file will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced @@ -158,7 +158,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 1 | def | | T5 | TMAX | TRUE | FALSE | T109 -Final Destination Table will be: +Final Destination Table will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- | --- | --- @@ -173,15 +173,15 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - In batch file we got records with id = 1 and id = 2. -- We set other columns(non updated columns) to the values of the active records. In above case for id = 2, we didn’t get COL1 value, so we set COL1 to “mno”(COL1 value of the active record) - - _fivetran_end of the active record in destination table is set to _fivetran_start-1 of the _fivetran_earliest record of batch file -- Set _fivetran_active for above updated record to FALSE and deleted_column(if present in destination table) to TRUE -- Other columns are set AS IS from the batch file in the destination table except _fivetran_earliest column. +- We set other columns(non updated columns) to the values of the active records. In above case for id = 2, we didn’t get COL1 value, so we set COL1 to “mno”(COL1 value of the active record). + - We set `_fivetran_end` of the active record in destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. +- We set `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE +- We set other columns _as is_ from the batch file in the destination table except the `_fivetran_earliest` column. ##### Deletes -Existing Table in destination: +Suppose the existing Table in destination is as below: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- | --- | --- @@ -199,14 +199,14 @@ Id(PK) | Timestamp | Type 1 | T3 | Deleted -Delete batch file will be: +Delete batch file will be as follows: Id(PK) | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced --- | --- |---------------|------| --- | --- 1 | | T3-1 | | TRUE | T104 -Final Destination Table will be: +Final Destination Table will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- |------------------| --- @@ -215,6 +215,6 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 2 | mno | 3 | T2 | TMAX | TRUE | T103 **Explanation:** -Set `_fivetran_active` to FALSE for the active record and set `_fivetran_end` = T3-1 and `deleted_column`(if present in destination) to TRUE +- We set the `_fivetran_active` column value to FALSE for the active record and set the `_fivetran_end` column value to `T3-1` and the `deleted_column`(if present in the destination table) to TRUE From 93746f232b373be30b7700115ef5f7ee895768e0 Mon Sep 17 00:00:00 2001 From: fivetran-abdulsalam Date: Tue, 18 Jun 2024 19:24:03 +0530 Subject: [PATCH 05/13] small refactor: --- development-guide.md | 2 +- how-to-handle-history-mode-files.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/development-guide.md b/development-guide.md index 32900fb..a9fb5dd 100644 --- a/development-guide.md +++ b/development-guide.md @@ -112,7 +112,7 @@ This operation should report all columns in the destination table, including Fiv - `delete_files` is for `hard delete` operation. Use primary key columns (or `_fivetran_id` system column for primary-keyless tables) to perform `DELETE FROM`. -> Note: To handle history mode `replace_files`, `update_files` and `delete_files`, follow [How to Handle History Mode Data](how-to-handle-history-mode-files.md) guide +> Note: To handle history mode `replace_files`, `update_files` and `delete_files`, follow [How to Handle History Mode Data](how-to-handle-history-mode-files.md) guide. Also, Fivetran will deduplicate operations such that each primary key will show up only once in any of the operations diff --git a/how-to-handle-history-mode-files.md b/how-to-handle-history-mode-files.md index 2e4143b..3bd1389 100644 --- a/how-to-handle-history-mode-files.md +++ b/how-to-handle-history-mode-files.md @@ -110,9 +110,9 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - We got new records for id = 1. -- We check for the corresponding earliest record(`_fivetran_earliest` as TRUE), and deleted the existing records from the destination table if `_fivetran_start` of destination table is greater than or equal to the `_fivetran_start` of the batch file(in the above example we have such records, so deleted records with id = 1 and _fivetran_start = T3 and T4) +- We check for the corresponding earliest record(`_fivetran_earliest` as TRUE), and deleted the existing records from the destination table if `_fivetran_start` of destination table is greater than or equal to the `_fivetran_start` of the batch file(in the above example we have such records, so we deleted records with id = 1, _fivetran_start = T3 and id = 1, _fivetran_start = T4). - `_fivetran_end` of the active record in destination table is set to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. -- We set the `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE +- We set the `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE. - We inserted new records _as is_ excluding the `_fivetran_earliest` column in the destination table. ##### Updates @@ -175,7 +175,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active - In batch file we got records with id = 1 and id = 2. - We set other columns(non updated columns) to the values of the active records. In above case for id = 2, we didn’t get COL1 value, so we set COL1 to “mno”(COL1 value of the active record). - We set `_fivetran_end` of the active record in destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. -- We set `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE +- We set `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE. - We set other columns _as is_ from the batch file in the destination table except the `_fivetran_earliest` column. @@ -215,6 +215,6 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 2 | mno | 3 | T2 | TMAX | TRUE | T103 **Explanation:** -- We set the `_fivetran_active` column value to FALSE for the active record and set the `_fivetran_end` column value to `T3-1` and the `deleted_column`(if present in the destination table) to TRUE +- We set the `_fivetran_active` column value to FALSE for the active record and set the `_fivetran_end` column value to `T3-1` and the `deleted_column`(if present in the destination table) to TRUE. From c92095e5fd11c7f31176c7314aa69ab01b4c8f78 Mon Sep 17 00:00:00 2001 From: fivetran-abdulsalam Date: Tue, 18 Jun 2024 19:32:39 +0530 Subject: [PATCH 06/13] more refactor --- how-to-handle-history-mode-files.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/how-to-handle-history-mode-files.md b/how-to-handle-history-mode-files.md index 3bd1389..5149308 100644 --- a/how-to-handle-history-mode-files.md +++ b/how-to-handle-history-mode-files.md @@ -34,7 +34,7 @@ Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fiv ###### Example 1: When the `_fivetran_start` column value of destination table is less than `_fivetran_start` of batch file. -Suppose the existing Table in destination is as below: +Suppose the existing table in destination is as below: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- |------|----| --- | --- | --- | --- @@ -117,7 +117,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active ##### Updates -Suppose the existing Table in destination is as below: +Suppose the existing table in destination is as below: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- | --- | --- @@ -181,7 +181,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active ##### Deletes -Suppose the existing Table in destination is as below: +Suppose the existing table in destination is as below: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- | --- | --- From 747a5e4c11c49d5f1b1acf5a0eff16068ec26814 Mon Sep 17 00:00:00 2001 From: fivetran-abdulsalam Date: Tue, 18 Jun 2024 19:33:54 +0530 Subject: [PATCH 07/13] changed grammar --- how-to-handle-history-mode-files.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/how-to-handle-history-mode-files.md b/how-to-handle-history-mode-files.md index 5149308..71d6638 100644 --- a/how-to-handle-history-mode-files.md +++ b/how-to-handle-history-mode-files.md @@ -42,7 +42,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 1 | pqr | 2 | T2 | TMAX | TRUE | T101 2 | mno | 3 | T2 | TMAX | TRUE | T103 -At source, new records are added: +At the source, new records are added: Id(PK) | COL1 | COL2 | Timestamp | Type --- | --- | --- |-----------| --- @@ -126,7 +126,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 2 | mno | 3 | T2 | TMAX | TRUE | T103 -At source records with Id = 1 is updated: +At the source records with Id = 1 is updated: Id(PK) | COL1 | Timestamp | Type --- | --- | --- | --- @@ -191,7 +191,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | -At source a record is deleted: +At the source a record is deleted: Id(PK) | Timestamp | Type From 62deeb8c4604a0c24d155014c37e19f97f2ef4ac Mon Sep 17 00:00:00 2001 From: fivetran-abdulsalam Date: Tue, 18 Jun 2024 22:11:36 +0530 Subject: [PATCH 08/13] changed insert to upserted --- how-to-handle-history-mode-files.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/how-to-handle-history-mode-files.md b/how-to-handle-history-mode-files.md index 71d6638..d764bbf 100644 --- a/how-to-handle-history-mode-files.md +++ b/how-to-handle-history-mode-files.md @@ -46,10 +46,10 @@ At the source, new records are added: Id(PK) | COL1 | COL2 | Timestamp | Type --- | --- | --- |-----------| --- -1 | def |1 | T3 | Inserted -1 | ghi | 1 | T4 | Inserted +1 | def |1 | T3 | Upserted +1 | ghi | 1 | T4 | Upserted -The Replace batch file will be as follows: +The replace batch file will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced --- |------|-------|---------------------| --- | --- | --- | --- @@ -90,7 +90,7 @@ At the source new records are added: Id(PK) | COL1 | COL2 | Timestamp | Type --- | --- | --- | --- | --- -1 | ghi | 1 | T2 | Inserted +1 | ghi | 1 | T2 | Upserted @@ -148,7 +148,7 @@ Id(PK) | COL1 | Timestamp | Type -Update batch file will be as follows: +The update batch file will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced From c33694682f6dc24221d30f469268ac98b7167bc4 Mon Sep 17 00:00:00 2001 From: fivetran-abdulsalam Date: Thu, 20 Jun 2024 12:50:20 +0530 Subject: [PATCH 09/13] incorporate more comments --- development-guide.md | 2 +- ... how-to-handle-history-mode-batch-files.md | 48 +++++++++---------- 2 files changed, 24 insertions(+), 26 deletions(-) rename how-to-handle-history-mode-files.md => how-to-handle-history-mode-batch-files.md (74%) diff --git a/development-guide.md b/development-guide.md index a9fb5dd..d56b833 100644 --- a/development-guide.md +++ b/development-guide.md @@ -112,7 +112,7 @@ This operation should report all columns in the destination table, including Fiv - `delete_files` is for `hard delete` operation. Use primary key columns (or `_fivetran_id` system column for primary-keyless tables) to perform `DELETE FROM`. -> Note: To handle history mode `replace_files`, `update_files` and `delete_files`, follow [How to Handle History Mode Data](how-to-handle-history-mode-files.md) guide. +> Note: To handle history mode `replace_files`, `update_files` and `delete_files`, follow [How to Handle History Mode Batch Files](how-to-handle-history-mode-batch-files.md) guide. Also, Fivetran will deduplicate operations such that each primary key will show up only once in any of the operations diff --git a/how-to-handle-history-mode-files.md b/how-to-handle-history-mode-batch-files.md similarity index 74% rename from how-to-handle-history-mode-files.md rename to how-to-handle-history-mode-batch-files.md index d764bbf..a2d5b3c 100644 --- a/how-to-handle-history-mode-files.md +++ b/how-to-handle-history-mode-batch-files.md @@ -1,24 +1,24 @@ -#### What is History Mode +# What is History Mode -History mode allows us to capture every version of each record processed by the Fivetran connectors. -In order to keep all versions of the record, we have introduced three new system columns for tables with history mode enabled. +History mode allows to capture every available version of each record from Fivetran source connectors. +In order to keep all versions of the records, three new system columns are added to tables with history mode enabled. Column | Type | Description --- | --- | --- _fivetran_active | Boolean | TRUE if it is the currently active record. FALSE if it is a historical version of the record. Only one version of the record can be TRUE. _fivetran_start | TimeStamp | The time when the record was first created or modified in the source. -_fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active(`_fivetran_active`=FALSE), then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of delete operation. If the record is active (`_fivetran_active`=TRUE), then `_fivetran_end` is the maximal allowed value that we can set for a TIMESTAMP column. +_fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active, then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of delete operation. If the record is active, then `_fivetran_end` is set to maximum TIMESTAMP value. -#### Points to remember in history mode +# Points to remember in history mode - In `WriterBatchRequest`, we pass a new optional field HistoryMode which indicates connector is in history mode or not. In this HistoryMode field, we pass `deleted_column` column name which we need to modify only if it is present in destination. - If the existing table is not empty, then, in the batch file, we also send a boolean column `_fivetran_earliest`. Suppose, in an `upsert`, we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` column value to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE. -- For each Replace, Update and Delete batch files, DELETE the existing records from destination table if `_fivetran_start` of destination table is greater than or equal to `_fivetran_start` of batch file(refer to [Example 1](#example1) and [Example 2](#example2)). +- For each `replace`, `update` and `delete` batch file, DELETE the existing records in the destination table with `_fivetran_start` greater than or equal to `_fivetran_start` of matcing records in batch file (refer to [Example 1](###Example 1) and [Example 2](###Example 2)). -NOTE: The `_fivetran_earliest` column shouldn't be added in the destination table. It is introduced to easily identify the earliest record and can be used to optimize the data loads query. -Below is an example of `replace_file` +Do Not add the `_fivetran_earliest` column to the destination table. It is provided for convenience to easily identify the earliest record and can be used to optimize the data load query. +Below is an example of a`replace` batch file in history mode: Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest ---|---------|---------------------| --- |------------------| --- @@ -27,13 +27,12 @@ Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fiv 1 | pqr | T2 | T3-1 | FALSE | FALSE 1 | def | T3 | TMAX | TRUE | FALSE -#### How to Handle Replaces, Updates and Deletes +# How to Handle Replaces, Updates and Deletes -##### Replace +## Replace -###### Example 1: +### Example 1: `_fivetran_start` column value of destination row is less than `_fivetran_start` of matching row in batch file: -When the `_fivetran_start` column value of destination table is less than `_fivetran_start` of batch file. Suppose the existing table in destination is as below: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced @@ -69,14 +68,13 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - We got new records for id = 1. -- We check for the corresponding earliest record (`_fivetran_earliest` as TRUE), and deleted the existing records from the destination table where the `_fivetran_start` column value in the destination table is greater than or equal to `_fivetran_start` of batch file. +- We check for the corresponding earliest record (`_fivetran_earliest` = TRUE), and delete the existing records from the destination table where the `_fivetran_start` column value is greater than or equal to `_fivetran_start` of matching rows in batch file. - We set `_fivetran_end` of the active record in destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. - We set the `_fivetran_active` column value for the above updated record to FALSE and `deleted_column` (if present in the destination table) to TRUE. -- We inserted new records _as is_ excluding the `_fivetran_earliest` column in the destination table. +- We inserted new records in the destination table _as is_, excluding the `_fivetran_earliest` column. -###### Example 2 +### Example 2: `_fivetran_start` column value of destination row is greater than or equal to the `_fivetran_start` of matching row in the batch file. -When the `_fivetran_start` column value in the destination table is greater than or equal to the `_fivetran_start` in the batch file. Suppose the existing table in destination is as below: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced @@ -109,13 +107,13 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active 2 | mno | 3 | T4 | TMAX | TRUE | T103 **Explanation:** -- We got new records for id = 1. -- We check for the corresponding earliest record(`_fivetran_earliest` as TRUE), and deleted the existing records from the destination table if `_fivetran_start` of destination table is greater than or equal to the `_fivetran_start` of the batch file(in the above example we have such records, so we deleted records with id = 1, _fivetran_start = T3 and id = 1, _fivetran_start = T4). +- We got a new record for id = 1. +- We check for the corresponding earliest record(`_fivetran_earliest` = TRUE), and delete existing records from the destination table where `_fivetran_start` of destination row is greater than or equal to the `_fivetran_start` of the matching row in batch file. In this example we have such records, so we deleted records with id = 1, _fivetran_start = T3 and id = 1, _fivetran_start = T4. - `_fivetran_end` of the active record in destination table is set to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. - We set the `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE. -- We inserted new records _as is_ excluding the `_fivetran_earliest` column in the destination table. +- We insert new records _as is_ excluding the `_fivetran_earliest` column. -##### Updates +## Updates Suppose the existing table in destination is as below: @@ -126,7 +124,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 2 | mno | 3 | T2 | TMAX | TRUE | T103 -At the source records with Id = 1 is updated: +At the source, record with Id = 1 is updated: Id(PK) | COL1 | Timestamp | Type --- | --- | --- | --- @@ -134,7 +132,7 @@ Id(PK) | COL1 | Timestamp | Type -And record with id = 2 is updated as: +And record with Id = 2 is updated as: Id(PK) | COL2 | Timestamp | Type --- | --- | --- | --- @@ -173,13 +171,13 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - In batch file we got records with id = 1 and id = 2. -- We set other columns(non updated columns) to the values of the active records. In above case for id = 2, we didn’t get COL1 value, so we set COL1 to “mno”(COL1 value of the active record). +- We set unmodified columns to the values of the active records. In this example, for id = 2, we didn’t get COL1 value, so we set COL1 to “mno” (COL1 value of the active record). - We set `_fivetran_end` of the active record in destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. - We set `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE. - We set other columns _as is_ from the batch file in the destination table except the `_fivetran_earliest` column. -##### Deletes +## Deletes Suppose the existing table in destination is as below: @@ -191,7 +189,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | -At the source a record is deleted: +At the source, a record is deleted: Id(PK) | Timestamp | Type From fd405e830c4bff789dc598690577c137f9c7f297 Mon Sep 17 00:00:00 2001 From: Alex Ilyichov Date: Thu, 20 Jun 2024 17:35:34 +0200 Subject: [PATCH 10/13] Apply suggestions from code review --- how-to-handle-history-mode-batch-files.md | 52 +++++++++++------------ 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/how-to-handle-history-mode-batch-files.md b/how-to-handle-history-mode-batch-files.md index a2d5b3c..d0d872e 100644 --- a/how-to-handle-history-mode-batch-files.md +++ b/how-to-handle-history-mode-batch-files.md @@ -13,12 +13,12 @@ _fivetran_end | TimeStamp | The value for this column depends on whether the rec # Points to remember in history mode -- In `WriterBatchRequest`, we pass a new optional field HistoryMode which indicates connector is in history mode or not. In this HistoryMode field, we pass `deleted_column` column name which we need to modify only if it is present in destination. +- In `WriterBatchRequest`, we pass a new optional field, `HistoryMode`, which indicates if the connector is in history mode or not. In this `HistoryMode` field, we pass the `deleted_column` column name, which we need to modify only if it is present in the destination. - If the existing table is not empty, then, in the batch file, we also send a boolean column `_fivetran_earliest`. Suppose, in an `upsert`, we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` column value to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE. -- For each `replace`, `update` and `delete` batch file, DELETE the existing records in the destination table with `_fivetran_start` greater than or equal to `_fivetran_start` of matcing records in batch file (refer to [Example 1](###Example 1) and [Example 2](###Example 2)). +- For each `replace`, `update` and `delete` batch file, DELETE the existing records in the destination table with `_fivetran_start` greater than or equal to `_fivetran_start` of matcing records in batch file (refer to [Example 1](#example1) and [Example 2](#example2)). -Do Not add the `_fivetran_earliest` column to the destination table. It is provided for convenience to easily identify the earliest record and can be used to optimize the data load query. -Below is an example of a`replace` batch file in history mode: +> IMPORTANT: Do not add the `_fivetran_earliest` column to the destination table. It is provided for convenience to easily identify the earliest record and can be used to optimize the data load query. +See the following example of a `replace` batch file in history mode: Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest ---|---------|---------------------| --- |------------------| --- @@ -31,9 +31,9 @@ Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fiv ## Replace -### Example 1: `_fivetran_start` column value of destination row is less than `_fivetran_start` of matching row in batch file: +### Example 1: `_fivetran_start` column value of destination row is less than `_fivetran_start` of matching row in batch file -Suppose the existing table in destination is as below: +Suppose the existing table in the destination is as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- |------|----| --- | --- | --- | --- @@ -68,14 +68,14 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - We got new records for id = 1. -- We check for the corresponding earliest record (`_fivetran_earliest` = TRUE), and delete the existing records from the destination table where the `_fivetran_start` column value is greater than or equal to `_fivetran_start` of matching rows in batch file. -- We set `_fivetran_end` of the active record in destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. +- We check for the corresponding earliest record (`_fivetran_earliest` = TRUE), and delete the existing records from the destination table where the `_fivetran_start` column value is greater than or equal to the `_fivetran_start` column value of the matching rows in batch file. +- We set the value of `_fivetran_end` of the active record in the destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of the batch file. - We set the `_fivetran_active` column value for the above updated record to FALSE and `deleted_column` (if present in the destination table) to TRUE. -- We inserted new records in the destination table _as is_, excluding the `_fivetran_earliest` column. +- We insert new records in the destination table _as is_, excluding the `_fivetran_earliest` column. -### Example 2: `_fivetran_start` column value of destination row is greater than or equal to the `_fivetran_start` of matching row in the batch file. +### Example 2: `_fivetran_start` column value of destination row is greater than or equal to the `_fivetran_start` of matching row in batch file -Suppose the existing table in destination is as below: +Suppose the existing table in the destination is as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- |---|--------|---------------------| --- |------------------| --- @@ -84,7 +84,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 1 | pqr | 2 | T4 | TMAX | TRUE | T101 2 | mno | 3 | T4 | TMAX | TRUE | T103 -At the source new records are added: +At the source, new records are added: Id(PK) | COL1 | COL2 | Timestamp | Type --- | --- | --- | --- | --- @@ -98,7 +98,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | --- | --- | --- | --- | --- | --- | --- | --- 1 | ghi | 1 | T2 | TMAX | TRUE | TRUE | T104 -Final Destination table will be as follows: +The final destination table will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- | --- | --- @@ -108,14 +108,14 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - We got a new record for id = 1. -- We check for the corresponding earliest record(`_fivetran_earliest` = TRUE), and delete existing records from the destination table where `_fivetran_start` of destination row is greater than or equal to the `_fivetran_start` of the matching row in batch file. In this example we have such records, so we deleted records with id = 1, _fivetran_start = T3 and id = 1, _fivetran_start = T4. -- `_fivetran_end` of the active record in destination table is set to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. -- We set the `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE. +- We check for the corresponding earliest record(`_fivetran_earliest` = TRUE), and delete existing records from the destination table where `_fivetran_start` of destination row is greater than or equal to the `_fivetran_start` of the matching row in the batch file. In this example, we have such records, so we delete records with id = 1, _fivetran_start = T3 and id = 1, _fivetran_start = T4. +- `_fivetran_end` of the active record in the destination table is set to `_fivetran_start`-1 of the `_fivetran_earliest` record of the batch file. +- We set the `_fivetran_active` column value for the above updated record to FALSE and `deleted_column`(if present in the destination table) to TRUE. - We insert new records _as is_ excluding the `_fivetran_earliest` column. ## Updates -Suppose the existing table in destination is as below: +Suppose the existing table in destination is as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- | --- | --- @@ -132,13 +132,13 @@ Id(PK) | COL1 | Timestamp | Type -And record with Id = 2 is updated as: +and record with Id = 2 is updated: Id(PK) | COL2 | Timestamp | Type --- | --- | --- | --- 2 | 1000 | T4 | Updated -And record with Id = 1 is again updated as +And lastly, record with Id = 1 is again updated: Id(PK) | COL1 | Timestamp | Type --- | --- | --- | --- @@ -170,16 +170,16 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active **Explanation:** - - In batch file we got records with id = 1 and id = 2. -- We set unmodified columns to the values of the active records. In this example, for id = 2, we didn’t get COL1 value, so we set COL1 to “mno” (COL1 value of the active record). - - We set `_fivetran_end` of the active record in destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of batch file. -- We set `_fivetran_active` column value for above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE. + - In the batch file, we got records with id = 1 and id = 2. +- We set unmodified columns' values to the values of the active records. In this example, for id = 2, we didn’t get COL1 value, so we set COL1 to “mno” (COL1 value of the active record). +- We set `_fivetran_end` of the active record in the destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of the batch file. +- We set the `_fivetran_active` column value for the above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE. - We set other columns _as is_ from the batch file in the destination table except the `_fivetran_earliest` column. ## Deletes -Suppose the existing table in destination is as below: +Suppose the existing table in the destination is as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- | --- | --- @@ -197,14 +197,14 @@ Id(PK) | Timestamp | Type 1 | T3 | Deleted -Delete batch file will be as follows: +The delete batch file will be as follows: Id(PK) | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_earliest | _fivetran_synced --- | --- |---------------|------| --- | --- 1 | | T3-1 | | TRUE | T104 -Final Destination Table will be as follows: +The final destination table will be as follows: Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fivetran_synced --- | --- | --- | --- | --- |------------------| --- From 42112f4bbd665dea6c914d016be67578285ffd12 Mon Sep 17 00:00:00 2001 From: Alex Ilyichov Date: Thu, 20 Jun 2024 17:39:29 +0200 Subject: [PATCH 11/13] Update how-to-handle-history-mode-batch-files.md --- how-to-handle-history-mode-batch-files.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/how-to-handle-history-mode-batch-files.md b/how-to-handle-history-mode-batch-files.md index d0d872e..f9472fd 100644 --- a/how-to-handle-history-mode-batch-files.md +++ b/how-to-handle-history-mode-batch-files.md @@ -15,7 +15,7 @@ _fivetran_end | TimeStamp | The value for this column depends on whether the rec - In `WriterBatchRequest`, we pass a new optional field, `HistoryMode`, which indicates if the connector is in history mode or not. In this `HistoryMode` field, we pass the `deleted_column` column name, which we need to modify only if it is present in the destination. - If the existing table is not empty, then, in the batch file, we also send a boolean column `_fivetran_earliest`. Suppose, in an `upsert`, we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` column value to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE. -- For each `replace`, `update` and `delete` batch file, DELETE the existing records in the destination table with `_fivetran_start` greater than or equal to `_fivetran_start` of matcing records in batch file (refer to [Example 1](#example1) and [Example 2](#example2)). +- For each `replace`, `update` and `delete` batch file, DELETE the existing records in the destination table with `_fivetran_start` greater than or equal to `_fivetran_start` of matcing records in batch file (refer to [Example 1](#example-1) and [Example 2](#example-2)). > IMPORTANT: Do not add the `_fivetran_earliest` column to the destination table. It is provided for convenience to easily identify the earliest record and can be used to optimize the data load query. See the following example of a `replace` batch file in history mode: From df9968dbde44c3836b8e2b3f12ac59b44702cee4 Mon Sep 17 00:00:00 2001 From: fivetran-abdulsalam Date: Thu, 20 Jun 2024 21:45:17 +0530 Subject: [PATCH 12/13] removed deleted_column --- how-to-handle-history-mode-batch-files.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/how-to-handle-history-mode-batch-files.md b/how-to-handle-history-mode-batch-files.md index f9472fd..6776d2a 100644 --- a/how-to-handle-history-mode-batch-files.md +++ b/how-to-handle-history-mode-batch-files.md @@ -13,7 +13,7 @@ _fivetran_end | TimeStamp | The value for this column depends on whether the rec # Points to remember in history mode -- In `WriterBatchRequest`, we pass a new optional field, `HistoryMode`, which indicates if the connector is in history mode or not. In this `HistoryMode` field, we pass the `deleted_column` column name, which we need to modify only if it is present in the destination. +- In `WriterBatchRequest`, we pass a new boolean field, `history_mode`, which indicates if the connector is in history mode or not. - If the existing table is not empty, then, in the batch file, we also send a boolean column `_fivetran_earliest`. Suppose, in an `upsert`, we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` column value to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE. - For each `replace`, `update` and `delete` batch file, DELETE the existing records in the destination table with `_fivetran_start` greater than or equal to `_fivetran_start` of matcing records in batch file (refer to [Example 1](#example-1) and [Example 2](#example-2)). @@ -70,7 +70,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active - We got new records for id = 1. - We check for the corresponding earliest record (`_fivetran_earliest` = TRUE), and delete the existing records from the destination table where the `_fivetran_start` column value is greater than or equal to the `_fivetran_start` column value of the matching rows in batch file. - We set the value of `_fivetran_end` of the active record in the destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of the batch file. -- We set the `_fivetran_active` column value for the above updated record to FALSE and `deleted_column` (if present in the destination table) to TRUE. +- We set the `_fivetran_active` column value for the above updated record to FALSE. - We insert new records in the destination table _as is_, excluding the `_fivetran_earliest` column. ### Example 2: `_fivetran_start` column value of destination row is greater than or equal to the `_fivetran_start` of matching row in batch file @@ -110,7 +110,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active - We got a new record for id = 1. - We check for the corresponding earliest record(`_fivetran_earliest` = TRUE), and delete existing records from the destination table where `_fivetran_start` of destination row is greater than or equal to the `_fivetran_start` of the matching row in the batch file. In this example, we have such records, so we delete records with id = 1, _fivetran_start = T3 and id = 1, _fivetran_start = T4. - `_fivetran_end` of the active record in the destination table is set to `_fivetran_start`-1 of the `_fivetran_earliest` record of the batch file. -- We set the `_fivetran_active` column value for the above updated record to FALSE and `deleted_column`(if present in the destination table) to TRUE. +- We set the `_fivetran_active` column value for the above updated record to FALSE. - We insert new records _as is_ excluding the `_fivetran_earliest` column. ## Updates @@ -173,7 +173,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active - In the batch file, we got records with id = 1 and id = 2. - We set unmodified columns' values to the values of the active records. In this example, for id = 2, we didn’t get COL1 value, so we set COL1 to “mno” (COL1 value of the active record). - We set `_fivetran_end` of the active record in the destination table to `_fivetran_start`-1 of the `_fivetran_earliest` record of the batch file. -- We set the `_fivetran_active` column value for the above updated record to FALSE and `deleted_column`(if present in destination table) to TRUE. +- We set the `_fivetran_active` column value for the above updated record to FALSE. - We set other columns _as is_ from the batch file in the destination table except the `_fivetran_earliest` column. @@ -213,6 +213,6 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | 2 | mno | 3 | T2 | TMAX | TRUE | T103 **Explanation:** -- We set the `_fivetran_active` column value to FALSE for the active record and set the `_fivetran_end` column value to `T3-1` and the `deleted_column`(if present in the destination table) to TRUE. +- We set the `_fivetran_active` column value to FALSE for the active record and set the `_fivetran_end` column value to `T3-1`. From 92deeb61211fb15fb129ceefac11f3c69b51d788 Mon Sep 17 00:00:00 2001 From: Alex Ilyichov Date: Thu, 20 Jun 2024 19:49:54 +0200 Subject: [PATCH 13/13] Apply suggestions from code review --- how-to-handle-history-mode-batch-files.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/how-to-handle-history-mode-batch-files.md b/how-to-handle-history-mode-batch-files.md index 6776d2a..12f8a65 100644 --- a/how-to-handle-history-mode-batch-files.md +++ b/how-to-handle-history-mode-batch-files.md @@ -11,7 +11,7 @@ _fivetran_start | TimeStamp | The time when the record was first created or modi _fivetran_end | TimeStamp | The value for this column depends on whether the record is active. If the record is not active, then `_fivetran_end` value will be `_fivetran_start` of the next version of the record minus 1 millisecond. If the record is deleted, then the value will be the same as the timestamp of delete operation. If the record is active, then `_fivetran_end` is set to maximum TIMESTAMP value. -# Points to remember in history mode +## Points to remember in history mode - In `WriterBatchRequest`, we pass a new boolean field, `history_mode`, which indicates if the connector is in history mode or not. - If the existing table is not empty, then, in the batch file, we also send a boolean column `_fivetran_earliest`. Suppose, in an `upsert`, we got multiple versions of the same record in a flush, then we set the `_fivetran_earliest` column value to `TRUE` for the record which have the earliest `_fivetran_start` and rest of the versions will have `_fivetran_earliest` as FALSE. @@ -27,11 +27,12 @@ Id(PK) | COL1 | _fivetran_start(PK) | _fivetran_end | _fivetran_active | _fiv 1 | pqr | T2 | T3-1 | FALSE | FALSE 1 | def | T3 | TMAX | TRUE | FALSE -# How to Handle Replaces, Updates and Deletes +## How to Handle Replaces, Updates and Deletes -## Replace +### Replace -### Example 1: `_fivetran_start` column value of destination row is less than `_fivetran_start` of matching row in batch file +#### Example 1 +This example describes a case where the `_fivetran_start` column value of the destination row is less than `_fivetran_start` of the matching row in the batch file. Suppose the existing table in the destination is as follows: @@ -73,7 +74,9 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active - We set the `_fivetran_active` column value for the above updated record to FALSE. - We insert new records in the destination table _as is_, excluding the `_fivetran_earliest` column. -### Example 2: `_fivetran_start` column value of destination row is greater than or equal to the `_fivetran_start` of matching row in batch file +#### Example 2 + +This example describes a case where the `_fivetran_start` column value of the destination row is greater than or equal to the `_fivetran_start` of the matching row in the batch file. Suppose the existing table in the destination is as follows: @@ -113,7 +116,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active - We set the `_fivetran_active` column value for the above updated record to FALSE. - We insert new records _as is_ excluding the `_fivetran_earliest` column. -## Updates +### Updates Suppose the existing table in destination is as follows: @@ -177,7 +180,7 @@ Id(PK) | COL1 | COL2 | _fivetran_start(PK) | _fivetran_end | _fivetran_active - We set other columns _as is_ from the batch file in the destination table except the `_fivetran_earliest` column. -## Deletes +### Deletes Suppose the existing table in the destination is as follows: