Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SCD Type 2 implementation example #4016

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ol-s-cloud
Copy link

@ol-s-cloud ol-s-cloud commented Jan 5, 2025

This PR adds an example demonstrating how to implement Slowly Changing Dimension (SCD) Type 2 using Delta Lake. This is a common data warehousing pattern that many users implement, so having a reference implementation will be valuable to the community.

Features Demonstrated

  • Creation and maintenance of historical records using Delta Lake
  • Proper use of merge operations for SCD Type 2 implementation
  • Best practices for temporal data handling
  • Efficient querying patterns for current and historical data

Implementation Details

The example includes:

  • Full Scala implementation with comments
  • Comprehensive README with setup instructions
  • Best practices for Delta Lake operations
  • Example queries for accessing historical data

Testing Done

  • Verified the example runs successfully with Delta Lake 2.4.0
  • Tested with both new records and updates to existing records
  • Validated historical record preservation
  • Confirmed proper handling of current/historical queries

Documentation

Added complete documentation including:

  • Prerequisites
  • Setup instructions
  • Usage examples
  • Output explanations
  • Best practices

Signed-off-by: Sa'id Olanrewaju [email protected]

This commit adds an example demonstrating how to implement Slowly Changing
Dimension (SCD) Type 2 using Delta Lake. The example shows:
- How to create and maintain historical records
- Proper use of merge operations
- Best practices for temporal data handling
- Querying current and historical data

Signed-off-by: Your Name <[email protected]>
- Add Apache License header
- Organize imports
- Add proper package declaration
- Improve code documentation
- Fix formatting and indentation
- Split code into logical methods
- Remove redundant whitespace

Signed-off-by: Your Name <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant