Skip to content

Latest commit

 

History

History
46 lines (39 loc) · 2.46 KB

CHANGELOG.md

File metadata and controls

46 lines (39 loc) · 2.46 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[6.0.0] - 2020-05

Migrated

  • Spark 3.0 Migration
    • Migrate to Spark version 3.0.1, Hadoop 3.2.1 and Scala 2.12
    • Spark 3 uses the Proleptic Gregorian calendar. In case there are problems when data sources have dates before 1582 or other problematics formats, as a quick fix we can set the following spark parameters in the pipelines:
      "spark.sql.legacy.timeParserPolicy": "LEGACY", "spark.sql.legacy.parquet.datetimeRebaseModeInWrite": "LEGACY", "spark.sql.legacy.parquet.datetimeRebaseModeInRead": "LEGACY"
      
      An example of an exception related to parsing dates and timestamps looks like this:
      SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '00/00/0000' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
      
      Note 1: there's also two other exceptions that we observed related to reading or writing Parquets with old date/time formats. They look very similar to the Spark upgrade exception above, but highlight the need to change the respective spark.sql.legacy.parquet.datetimeRebaseModeInXXXXX property. Note 2: the solution provided above should cover all the exceptions enumerated here for a given data source.

[5.8.0] - 2020-04

Added

  • Fix reconciliation execution time by removing unneeded caching stage.

[5.7.5] - 2020-04

Added

  • Enable multi-line option for append loads
  • fix duplicate issues generated by the latest changes applied to CompetitorDataPreprocessor

[5.7.2] - 2021-02

Added

  • Make init condensation optional, but true by default.

[5.7.1] - 2020-02

Added

[5.7.0] - 2020-01

Added

  • Support for multiple partition attributes (non date-derived) and single non date-derived partition attributes.