Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanism for reducing dump size by limiting rows per table? #52

Open
mateodelnorte opened this issue Sep 10, 2019 · 3 comments
Open

Mechanism for reducing dump size by limiting rows per table? #52

mateodelnorte opened this issue Sep 10, 2019 · 3 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@mateodelnorte
Copy link

Thanks for this tool. It looks pretty great.

I'd like to both anonymize my data as well as decrease the size of the overall database size. Is there a mechanism such that I could specify a maximum number of records for a particular table, and delete any records prior to that maximum set?

@junkert
Copy link
Collaborator

junkert commented Sep 10, 2019

Thanks @mateodelnorte for the question and request.

Unfortunately there is no way to do this currently, but it would be a rather easy to modify the generator handle row counts when processing the dump file. I'll see if I can find the time in the next couple of weeks to add this feature. I know we will need this eventually here at SmithRx as well.

On another note you can also minimize the size of your database by using the --exclude-table and --exclude-table-data options which allow you to exclude full tables (do not include DDL) or just the table data (keep DDL, but ignore data in the table) from the dump process.

For example, we have a table that is denormalized when first added to our database. This table is is very large and sparse until we process it and normalize the records. We choose to ignore this table's data during the dump process since we only care about the normalized data when testing.

@junkert junkert added enhancement New feature or request good first issue Good for newcomers labels Sep 10, 2019
@junkert junkert self-assigned this Nov 26, 2019
@junkert
Copy link
Collaborator

junkert commented Nov 26, 2019

@mateodelnorte looking into implementing this soon. Does the solution described above work for your use?

If we limit tables by size then we will not be able to keep foreign keys consistent between tables. If you do not care about foreign keys existing then the solution above should work.

If we want to ensure foreign key consistency with size limiting we will need to rewrite a large portion of the generator which may take a lot of time and would probably require a new product version release.

@mateodelnorte
Copy link
Author

Thanks @junkert. I actually created a simple db-trim tool to do the same as is suggested above. Would be happy to use it as a part of this tool and not have to maintain mine. Overall, something that checks referential integrity would also be great. But, I'm sure you're thinking of that as well and recognize the increased complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants