-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate recon config from table definition #1433
base: main
Are you sure you want to change the base?
Conversation
with open(file_name, 'w') as json_file: | ||
json_file.write(json_data) | ||
except (TypeError, IOError) as e: | ||
print(f"Failed to generate JSON file: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logger.
from databricks.labs.remorph.config import TableRecon | ||
|
||
|
||
def generate_json_file(table_recon: TableRecon, file_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableRecon is the target You will need to introduce TableDefinition.
from dataclasses import dataclass, field
from typing import List, Optional
@dataclass
class TableDefinition:
catalog: str
schema: str
table: str
location: Optional[str] = None
tableFormat: Optional[str] = None
viewText: Optional[str] = None
columns: List[StructField] = field(default_factory=list)
sizeGb: int = 0
comment: Optional[str] = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add primary_keys: list[str] | None = None
to it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not use StructField for field information in TableDefinition. Because the dataType field in it is of type DataType ; which is tied to Spark. When we bring metadata from 3rd party systems, they may not have a direct equivalent for Spark data types. So we may want to use our own dataclass to store column information which is loosely structured around StructField but uses string for dataType. If needed, we can have our dataclass -> StructField transformation function.
This pull request introduces functionality to generate a JSON file from a
TableRecon
object and includes a unit test for this new feature.