Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate recon config from table definition #1433

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sriram251-code
Copy link
Contributor

This pull request introduces functionality to generate a JSON file from a TableRecon object and includes a unit test for this new feature.

@sriram251-code sriram251-code self-assigned this Jan 28, 2025
@sriram251-code sriram251-code added the feat/recon making sure that remorphed query produces the same results as original label Jan 28, 2025
@sriram251-code sriram251-code marked this pull request as ready for review January 28, 2025 11:00
@sriram251-code sriram251-code requested a review from a team as a code owner January 28, 2025 11:00
with open(file_name, 'w') as json_file:
json_file.write(json_data)
except (TypeError, IOError) as e:
print(f"Failed to generate JSON file: {e}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logger.

from databricks.labs.remorph.config import TableRecon


def generate_json_file(table_recon: TableRecon, file_name: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableRecon is the target You will need to introduce TableDefinition.

from dataclasses import dataclass, field
from typing import List, Optional

@dataclass
class TableDefinition:
    catalog: str
    schema: str
    table: str
    location: Optional[str] = None
    tableFormat: Optional[str] = None
    viewText: Optional[str] = None
    columns: List[StructField] = field(default_factory=list)
    sizeGb: int = 0
    comment: Optional[str] = None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add primary_keys: list[str] | None = None to it too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not use StructField for field information in TableDefinition. Because the dataType field in it is of type DataType ; which is tied to Spark. When we bring metadata from 3rd party systems, they may not have a direct equivalent for Spark data types. So we may want to use our own dataclass to store column information which is loosely structured around StructField but uses string for dataType. If needed, we can have our dataclass -> StructField transformation function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/recon making sure that remorphed query produces the same results as original
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants