-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding a mechanism for cleaning the database #25
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want to be too picky since this isn't something that's going to be kept around, but did have a few comments.
In the end, I want to
- make sure the filtering logic is good (from 2020 with multiple sessions and updated in 2022).
- I think it would be helpful to have the option to just list the sessions that would be deleted.
Before anything, I would recommend just getting Eva's input on this, Shak. Since this is just a one-time tool to clean up some specific set of bad data, maybe it's fine to just merge this in and run it? 🤷 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a bit of work -- I think a "safer" solution would be to iter through all sessions, check if the video uri is accessible and then if not, delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much cleaner! Just a few more comments.
if sys.argv[-1] == "real": | ||
for session in sessions_for_delete: | ||
db_models.Session.collection.delete(session.key) | ||
elif sys.argv[-1] == "dry_run": | ||
print(f"This is a dry run, but if it were real, {len(sessions_for_delete)} duplicates would be deleted.") | ||
else: | ||
print("Please pass a valid parameter: \"real\" or \"dry_run.\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works but better practice would be to use purposeful module like argparse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also typically, programs will accept just a 'dry run' switch. And if not given, run in 'real' mode. i.e. The 'real' option here is sort of redundant I think.
I do agree running in 'dry run' by default isn't a bad idea here, for safety. And run in 'real' mode only if argument is set.
sessions_for_delete = [] | ||
sessions = db_models.Session.collection.fetch(1000) | ||
for session in sessions: | ||
response = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this variable (response
) being used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the file to run on the server. Before running, I recommend deleting just one record rather than all the records for testing.