Skip to content

Commit

Permalink
**Bug Fixes**
Browse files Browse the repository at this point in the history
- Doubling of mapping locations for partitions in `distcp` for STORAGE_MIGRATION.
- [STORAGE_MIGRATION is setting ACID to ON, regardless.](#158)

**What's New**

- [Validate JDBC Jar Files in config.](#159)
- Ability to turn-on `strict` mode for Storage Migration.  This will cause `distcp` to fail when non-standard locations are used.  To turn off, use the `-sms|--storage-migration-strict` flag via the CLI.

**Behavior Changes**

The default behavior for Storage Migration 'strict' has changed from `true` to `false`. The intent behind the `strict` mode was to ensure `distcp` would fail when non-standard locations are used.  The combination of `metastore_direct` and knowing the partition location details gives us a better chance on making these mappings work for `distcp`.  When the scenario arises, we do **HIGHLY** recommend that you validate the plans created.  The new default behavior will allow `distcp` to continue when non-standard locations are encountered, while throwing a warning.  This will allow the migration to continue, but you should validate the results.
  • Loading branch information
dstreev committed Nov 21, 2024
1 parent 6f80feb commit 1a94e81
Show file tree
Hide file tree
Showing 21 changed files with 925 additions and 86 deletions.
16 changes: 16 additions & 0 deletions Writerside/topics/Release-Notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,22 @@ The latest set of enhancement requests can be found [here](https://github.com/cl

If there is something you'd like to see, add a new issue [here](https://github.com/cloudera-labs/hms-mirror/issues)

## 2.2.0.19.0 (pre-release)

**Bug Fixes**

- Doubling of mapping locations for partitions in `distcp` for STORAGE_MIGRATION.
- [STORAGE_MIGRATION is setting ACID to ON, regardless.](https://github.com/cloudera-labs/hms-mirror/issues/158)

**What's New**

- [Validate JDBC Jar Files in config.](https://github.com/cloudera-labs/hms-mirror/issues/159)
- Ability to turn-on `strict` mode for Storage Migration. This will cause `distcp` to fail when non-standard locations are used. To turn off, use the `-sms|--storage-migration-strict` flag via the CLI.

**Behavior Changes**

The default behavior for Storage Migration 'strict' has changed from `true` to `false`. The intent behind the `strict` mode was to ensure `distcp` would fail when non-standard locations are used. The combination of `metastore_direct` and knowing the partition location details gives us a better chance on making these mappings work for `distcp`. When the scenario arises, we do **HIGHLY** recommend that you validate the plans created. The new default behavior will allow `distcp` to continue when non-standard locations are encountered, while throwing a warning. This will allow the migration to continue, but you should validate the results.

## 2.2.0.18.1


Expand Down
10 changes: 10 additions & 0 deletions Writerside/topics/Transfer-Storage-Migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,14 @@ you expect since there's no guarantee that there isn't a lot of other 'extra' da
transfer:
storageMigration:
consolidateTablesForDistcp: true|false
```

## Strict Mode

When true and using `distcp`, the migration will fail if the table partition locations aren't standard. The default is `false`. The default mode will allow you to move forward with warnings about the non-standard partition locations. It's highly recommended that you validate the location mappings in the `distcp` plan before running the migration.

```yaml
transfer:
storageMigration:
strict: true|false
```
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

<groupId>com.cloudera.utils.hadoop</groupId>
<artifactId>hms-mirror</artifactId>
<version>2.2.0.18.2</version>
<version>2.2.0.19.0</version>
<packaging>jar</packaging>

<name>hms-mirror</name>
Expand Down
15 changes: 8 additions & 7 deletions src/main/java/com/cloudera/utils/hms/mirror/MessageCode.java
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,11 @@ public enum MessageCode {
"location with SCHEMA_ONLY so you can build out the movement plan."),
DISTCP_VALID_STRATEGY("The `distcp` option is not valid for this strategy ({0}) and configuration."),
DISTCP_WITH_MISMATCHING_LOCATIONS("You''ve specified ''distcp'' with mismatching locations for {0} {1}: " +
"Original Location {2}, Specification {3}. When these don''t match, a valid distcp plan can''t be created to " +
"correctly align the data elements. You''ll need to use SQL to migrate the data and allow Hive to reorganize it " +
"according to your specs."),
"Original Location {2}, Specification {3}. When these don''t match, the 'distcp' plan created might not " +
"correctly align the data elements. You MUST validate the plan before executing. Proceed at your own risk!"),
DISTCP_WITH_MISMATCHING_TABLE_LOCATION("You''ve specified ''distcp'' with mismatching locations for {0} {1}: " +
"Original Location: {2}, Derived table name from directory: {3}. The partition directory doesn''t match the table name and we can''t " +
"correctly align the data elements via distcp. You''ll need to use SQL to migrate the data and allow Hive to reorganize it " +
"according to your specs."),
"Original Location: {2}, Derived table name from directory: {3}. When these don''t match the table name and the plan might not " +
"correctly align the data elements via distcp. You MUST validate the plan before executing. Proceed at your own risk!"),
DISTCP_WO_TABLE_FILTERS("`distcp` workbooks will include the DATABASE base directory, which may include more data than is " +
"expected at the 'db' level, especially if the root directory is used by other processes outside of the tables " +
"defined in the database."),
Expand Down Expand Up @@ -214,11 +212,14 @@ public enum MessageCode {
STORAGE_MIGRATION_REQUIRED_NAMESPACE("STORAGE_MIGRATION requires -smn or -cs to define the new namespace."),
STORAGE_MIGRATION_REQUIRED_STRATEGY("STORAGE_MIGRATION requires -sms to set the Data Strategy. Applicable options " +
"are SCHEMA_ONLY, SQL, EXPORT_IMPORT, or HYBRID"),
STORAGE_MIGRATION_STRICT("Storage Migration is in 'strict' mode. If the table and/or partition locations can't "
STORAGE_MIGRATION_STRICT("Storage Migration is in 'strict' mode. The table and/or partition locations can't "
+ "be mapped to the warehouse locations or there are mismatches in location standards, we can't process the table "
+ "without potential data loss. Add additional 'global-location-map' entries to cover the locations."
+ "Mismatched directories or non-standard partition locations can only be handled through the " +
"SQL dataMovementStategy."),
STORAGE_MIGRATION_NOT_STRICT_ISSUE("Storage Migration is NOT in 'strict' mode. The table and/or partition locations aren't "
+ "standard and could lead to data loss when migrated with 'distcp'. We strongly recommend reviewing the plans created " +
"and ensuring the data is correctly aligned before executing to avoid data loss and/or duplication."),
SYNC_TBL_FILTER("'sync' with 'table filter' will be bi-directional ONLY for tables that meet the table filter '"
+ "' ON BOTH SIDES!!!") // WARNINGS
,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1182,26 +1182,25 @@ CommandLineRunner configStorageMigrationNamespace(HmsMirrorConfig hmsMirrorConfi
@Bean
@Order(1)
@ConditionalOnProperty(
name = "hms-mirror.config.storage-migration-strict",
havingValue = "true")
CommandLineRunner configStorageMigrationStrictTrue(HmsMirrorConfig hmsMirrorConfig) {
name = "hms-mirror.config.storage-migration-strict")
CommandLineRunner configStorageMigrationStrictTrue(HmsMirrorConfig hmsMirrorConfig, @Value("${hms-mirror.config.storage-migration-strict}") boolean value) {
return args -> {
log.info("storage-migration-strict: {}", Boolean.TRUE);
hmsMirrorConfig.getTransfer().getStorageMigration().setStrict(Boolean.TRUE);
log.info("storage-migration-strict: {}", value);
hmsMirrorConfig.getTransfer().getStorageMigration().setStrict(value);
};
}

@Bean
@Order(1)
@ConditionalOnProperty(
name = "hms-mirror.config.storage-migration-strict",
havingValue = "false")
CommandLineRunner configStorageMigrationStrictFalse(HmsMirrorConfig hmsMirrorConfig) {
return args -> {
log.warn("storage-migration-strict: {} is not currently supported to ensure valid migration plans.", Boolean.FALSE);
// @Bean
// @Order(1)
// @ConditionalOnProperty(
// name = "hms-mirror.config.storage-migration-strict",
// havingValue = "false")
// CommandLineRunner configStorageMigrationStrictFalse(HmsMirrorConfig hmsMirrorConfig) {
// return args -> {
// log.warn("storage-migration-strict: {}", Boolean.FALSE);
// hmsMirrorConfig.getTransfer().getStorageMigration().setStrict(Boolean.FALSE);
};
}
// };
// }

@Bean
@Order(1)
Expand Down Expand Up @@ -1873,6 +1872,11 @@ private Options getOptions() {
storageMigrationNamespaceOption.setArgName("namespace");
options.addOption(storageMigrationNamespaceOption);

Option storageMigrationStrictOption = new Option("sms", "storage-migration-strict", false,
"Use 'strict' location translations for storage migration.");
storageMigrationStrictOption.setRequired(Boolean.FALSE);
options.addOption(storageMigrationStrictOption);

Option dbOption = new Option("db", "database", true,
"Comma separated list of Databases (upto 100).");
dbOption.setValueSeparator(',');
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,9 @@ public Boolean execute(TableMirror tableMirror) {

EnvironmentTable let = getEnvironmentTable(Environment.LEFT, tableMirror);
EnvironmentTable ret = getEnvironmentTable(Environment.RIGHT, tableMirror);

Boolean strictIssues = Boolean.FALSE;

try {
/*
If using distcp, we don't need to go through and rename/recreate the tables. We just need to change the
Expand Down Expand Up @@ -284,6 +287,14 @@ public Boolean execute(TableMirror tableMirror) {
for (Map.Entry<String, String> entry : let.getPartitions().entrySet()) {
String partSpec = entry.getKey();
int level = StringUtils.countMatches(partSpec, "/");
// To ensure the partition location is handled correctly when it's within the table location,
// we need to see if the partition location 'starts with' the table location.
String tableLocation = TableUtils.getLocation(tableMirror.getName(), tableMirror.getTableDefinition(Environment.LEFT));
if (entry.getValue().startsWith(tableLocation)) {
// Increase the pruning back to the table location so we aren't build distcp commands for the partitions
// that would be handled by the table location.
level++;
}
// Translate to 'partition spec'.
partSpec = TableUtils.toPartitionSpec(partSpec);
String partLocation = entry.getValue();
Expand All @@ -297,8 +308,12 @@ public Boolean execute(TableMirror tableMirror) {
String msg = MessageFormat.format(MessageCode.DISTCP_WITH_MISMATCHING_LOCATIONS.getDesc(),
"Partition", partSpec, partLocation, normalizedPartSpecLocation);
tableMirror.addIssue(Environment.LEFT, msg);
noIssues = Boolean.FALSE;
continue;
if (config.getTransfer().getStorageMigration().isStrict()) {
noIssues = Boolean.FALSE;
continue;
} else {
strictIssues = Boolean.TRUE;
}
} else {
// Since the partition location matches the partition spec, we also need
// to verify that the directory the partition is in, matches the table name
Expand All @@ -310,8 +325,12 @@ public Boolean execute(TableMirror tableMirror) {
String msg = MessageFormat.format(MessageCode.DISTCP_WITH_MISMATCHING_TABLE_LOCATION.getDesc(),
"Partition", partSpec, partLocation, tableDirName);
tableMirror.addIssue(Environment.LEFT, msg);
noIssues = Boolean.FALSE;
continue;
if (config.getTransfer().getStorageMigration().isStrict()) {
noIssues = Boolean.FALSE;
continue;
} else {
strictIssues = Boolean.TRUE;
}
}
}
}
Expand Down Expand Up @@ -374,6 +393,13 @@ public Boolean execute(TableMirror tableMirror) {
let.getSql().clear();
rtn = Boolean.FALSE;
}
if (strictIssues) {
// Strict is NOT set, but there were some issues. We need to post a warning here.
log.warn("Storage Migration is NOT strict, but there are concerns with the location mappings for {}",
tableMirror.getName());
let.addIssue(MessageCode.STORAGE_MIGRATION_NOT_STRICT_ISSUE.getDesc());
rtn = Boolean.TRUE;
}
} else {
// Distcp with Archive. The intent is to retain an archive of the table (and data)
// even under the distcp movement strategy. This allows the user access to the original
Expand Down Expand Up @@ -406,8 +432,12 @@ public Boolean execute(TableMirror tableMirror) {
String msg = MessageFormat.format(MessageCode.DISTCP_WITH_MISMATCHING_LOCATIONS.getDesc(),
"Partition", partSpec, partLocation, normalizedPartSpecLocation);
tableMirror.addIssue(Environment.LEFT, msg);
noIssues = Boolean.FALSE;
continue;
if (config.getTransfer().getStorageMigration().isStrict()) {
noIssues = Boolean.FALSE;
continue;
} else {
strictIssues = Boolean.TRUE;
}
} else {
// Since the partition location matches the partition spec, we also need
// to verify that the directory the partition is in, matches the table name
Expand All @@ -419,8 +449,12 @@ public Boolean execute(TableMirror tableMirror) {
String msg = MessageFormat.format(MessageCode.DISTCP_WITH_MISMATCHING_TABLE_LOCATION.getDesc(),
"Partition", partSpec, partLocation, tableDirName);
tableMirror.addIssue(Environment.LEFT, msg);
noIssues = Boolean.FALSE;
continue;
if (config.getTransfer().getStorageMigration().isStrict()) {
noIssues = Boolean.FALSE;
continue;
} else {
strictIssues = Boolean.TRUE;
}
}
}
}
Expand Down Expand Up @@ -468,6 +502,13 @@ public Boolean execute(TableMirror tableMirror) {
let.getSql().clear();
rtn = Boolean.FALSE;
}
if (strictIssues) {
// Strict is NOT set, but there were some issues. We need to post a warning here.
log.warn("Storage Migration is NOT strict, but there are concerns with the location mappings for {}",
tableMirror.getName());
let.addIssue(MessageCode.STORAGE_MIGRATION_NOT_STRICT_ISSUE.getDesc());
rtn = Boolean.TRUE;
}
}

// Build Distcp plan for moving table and partition data.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ public class StorageMigration implements Cloneable {
private boolean consolidateTablesForDistcp = Boolean.FALSE;
@Schema(description = "When strict is true, any issues during evaluation will cause the migration to fail. When false, " +
"the migration will continue but the issues will be reported. This can lead to data movement issues.")
private boolean strict = Boolean.TRUE;
private boolean strict = Boolean.FALSE;

@Override
public StorageMigration clone() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,55 +54,26 @@ STORAGE_MIGRATION test. Defining the warehouse directories (-wd and -ewd) along
*/
public class Test_sm_smn_wd_epl_dc extends E2EBaseTest {

@Test
public void issueTest() {
validateTableIssueCount("ext_purge_odd_parts", "web_sales",
Environment.LEFT, 2);
}

@Test
public void phaseTest() {
validatePhase("ext_purge_odd_parts", "web_sales", PhaseState.ERROR);
}

@Test
public void returnCodeTest() {
// Get Runtime Return Code.
long rtn = getReturnCode();
// Verify the return code.

// A few partitions have non-standard locations and can't be migrated without addition GLM entries.
long check = 1L;
long check = 0L;
assertEquals("Return Code Failure: " + rtn, check, rtn);
}

// @Test
// public void sqlTest() {
// Boolean foundAT = Boolean.FALSE;
// Boolean foundOddPart = Boolean.FALSE;
// Boolean foundOddPart2 = Boolean.FALSE;
//
// for (Pair pair : getConversion().getDatabase("ext_purge_odd_parts")
// .getTableMirrors().get("web_sales")
// .getEnvironmentTable(Environment.LEFT).getSql()) {
// if (pair.getDescription().trim().equals("Alter Table Location")) {
// assertEquals("Location doesn't match", "ALTER TABLE web_sales SET LOCATION \"ofs://OHOME90/finance/external-fso/ext_purge_odd_parts.db/web_sales\"", pair.getAction());
// foundAT = Boolean.TRUE;
// }
// if (pair.getDescription().trim().equals("Alter Table Partition Spec `ws_sold_date_sk`='2451180' Location")) {
// assertEquals("Location doesn't match", "ALTER TABLE web_sales PARTITION (`ws_sold_date_sk`='2451180') SET " +
// "LOCATION \"ofs://OHOME90/finance/external-fso/ext_purge_odd_parts.db/web_sales/ws_sold_date_sk=2451180\"", pair.getAction());
// foundOddPart = Boolean.TRUE;
// }
// if (pair.getDescription().trim().equals("Alter Table Partition Spec `ws_sold_date_sk`='2451188' Location")) {
// assertEquals("Location doesn't match", "ALTER TABLE web_sales PARTITION (`ws_sold_date_sk`='2451188') SET " +
// "LOCATION \"ofs://OHOME90/finance/external-fso/ext_purge_odd_parts.db/web_sales/ws_sold_date_sk=2451188\"", pair.getAction());
// foundOddPart2 = Boolean.TRUE;
// }
// }
// assertEquals("Alter Table Location not found", Boolean.TRUE, foundAT);
// assertEquals("Alter Odd Part Location not found", Boolean.TRUE, foundOddPart);
// assertEquals("Alter Odd Part 2 Location not found", Boolean.TRUE, foundOddPart2);
// }
@Test
public void phaseTest() {
validatePhase("ext_purge_odd_parts", "web_sales", PhaseState.SUCCESS);
}

@Test
public void issueTest() {
validateTableIssueCount("ext_purge_odd_parts", "web_sales",
Environment.LEFT, 2);
}

}
Loading

0 comments on commit 1a94e81

Please sign in to comment.