Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding a mechanism for cleaning the database #25

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Shak2000
Copy link
Contributor

This is the file to run on the server. Before running, I recommend deleting just one record rather than all the records for testing.

Copy link

@dphoria dphoria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to be too picky since this isn't something that's going to be kept around, but did have a few comments.

In the end, I want to

  1. make sure the filtering logic is good (from 2020 with multiple sessions and updated in 2022).
  2. I think it would be helpful to have the option to just list the sessions that would be deleted.

python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
@dphoria
Copy link

dphoria commented Apr 19, 2023

Before anything, I would recommend just getting Eva's input on this, Shak. Since this is just a one-time tool to clean up some specific set of bad data, maybe it's fine to just merge this in and run it? 🤷

Copy link
Member

@evamaxfield evamaxfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a bit of work -- I think a "safer" solution would be to iter through all sessions, check if the video uri is accessible and then if not, delete it.

python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
python/cdp_san_jose_backend/clean_database.py Outdated Show resolved Hide resolved
@dphoria dphoria self-requested a review May 8, 2023 14:48
Copy link

@dphoria dphoria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much cleaner! Just a few more comments.

Comment on lines +27 to +33
if sys.argv[-1] == "real":
for session in sessions_for_delete:
db_models.Session.collection.delete(session.key)
elif sys.argv[-1] == "dry_run":
print(f"This is a dry run, but if it were real, {len(sessions_for_delete)} duplicates would be deleted.")
else:
print("Please pass a valid parameter: \"real\" or \"dry_run.\"")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works but better practice would be to use purposeful module like argparse.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also typically, programs will accept just a 'dry run' switch. And if not given, run in 'real' mode. i.e. The 'real' option here is sort of redundant I think.

I do agree running in 'dry run' by default isn't a bad idea here, for safety. And run in 'real' mode only if argument is set.

sessions_for_delete = []
sessions = db_models.Session.collection.fetch(1000)
for session in sessions:
response = ""
Copy link

@dphoria dphoria May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this variable (response) being used anywhere?

python/clean_database.py Show resolved Hide resolved
Copy link
Member

@evamaxfield evamaxfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Shak2000, I prefer #26 over this one. I won't close this just yet but I have already approved #26. Let me know what you think @dphoria

@dphoria
Copy link

dphoria commented Jun 28, 2023

I agree with @Shak2000, I prefer #26 over this one. I won't close this just yet but I have already approved #26. Let me know what you think @dphoria

Yup, same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants