-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(clp-package): Rework del_archives script to also find and delete archives by ID. #665
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces a new Changes
Sequence DiagramsequenceDiagram
participant CLI as Command Line Interface
participant AM as Archive Manager
participant DB as Database
participant FS as File System
CLI->>AM: Invoke with arguments
AM->>AM: Validate configuration
AM->>AM: Parse timestamps
AM->>DB: Query archives
DB-->>AM: Return archive IDs
alt Delete by ID
AM->>DB: Delete archive records
AM->>FS: Remove archive files
else Delete by Time Range
AM->>DB: Delete records in time range
AM->>FS: Remove corresponding files
end
AM-->>CLI: Return operation status
Possibly related PRs
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (9)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5)
23-30
: LGTM! Consider enhancing error messages.The timestamp validation logic is correct and handles both ordering and non-negativity checks appropriately.
Consider adding the actual timestamp values to the error messages for better debugging:
- logger.error("begin-ts must be <= end-ts") + logger.error(f"begin-ts ({begin_ts}) must be <= end-ts ({end_ts})")
37-116
: LGTM! Consider clarifying dry-run behaviour.The command-line interface is well-structured with clear subcommands and appropriate argument validation.
Consider enhancing the dry-run help message to specify the exact output format:
- help="Preview delete without making changes. Lists errors and files to be deleted.", + help="Preview delete without making changes. Lists archive IDs and paths that would be deleted.",
138-151
: LGTM! Consider refactoring for better readability.The validation logic is correct but could be structured more clearly.
Consider extracting the validation into a separate function:
def validate_filter_timestamps(args): if hasattr(args, "end_ts") and args.end_ts is not None: return validate_timestamps(args.begin_ts, args.end_ts) return True
164-185
: LGTM! Consider using list extension for cleaner code.The command construction correctly handles all arguments and subcommands.
Consider using list extension for cleaner command construction:
base_cmd = [ "python3", "-m", "clp_package_utils.scripts.native.archive_manager", "--config", str(generated_config_path_on_container), str(parsed_args.subcommand), ] if parsed_args.subcommand == "del": cmd_args = ["--dry-run"] if parsed_args.dry_run else [] # ... rest of the command construction
187-194
: LGTM! Consider enhancing error handling.The command execution and cleanup are properly implemented.
Consider adding try-except for subprocess execution and cleanup:
try: subprocess.run(cmd, check=True) except subprocess.CalledProcessError as e: logger.error(f"Command failed with exit code {e.returncode}") return e.returncode finally: if generated_config_path_on_host.exists(): generated_config_path_on_host.unlink()components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4)
82-82
: Typo in help text for 'by-filter' subcommandThere's a typo in the help text for the
'by-filter'
subcommand:'delte archives within time frame.'
It should be'delete archives within time frame.'
Apply this diff to fix the typo:
-help="delte archives within time frame.", +help="delete archives within time frame.",
156-158
: Correct parameter names in the docstringIn the docstring of the
_find_archives
function, the parameters'begin_its'
and'end_its'
should be corrected to'begin_ts'
and'end_ts'
to match the actual parameter names.Apply this diff to fix the typos:
- Lists all archive IDs, if begin_its and end_its are provided, + Lists all archive IDs, if begin_ts and end_ts are provided,
246-247
: Log deletion messages after committing the transactionThe log message indicating that an archive was deleted from the database is currently logged before the transaction is committed. To ensure accurate logging, consider moving this message after the successful commit of the transaction.
Apply this diff to adjust the logging:
- for archive_id in archive_ids: - logger.info(f"Deleted archive {archive_id} from the database.") if "ids" == criteria: not_found_ids = set(params) - set(archive_ids) if not_found_ids: logger.warning( f""" Failed to find archives with the following IDs: {', '.join(not_found_ids)} """ ) db_cursor.execute( f""" DELETE FROM `{table_prefix}files` WHERE archive_id in ({', '.join(['%s'] * len(archive_ids))}) """, archive_ids, ) db_cursor.execute( f""" DELETE FROM `{table_prefix}archive_tags` WHERE archive_id in ({', '.join(['%s'] * len(archive_ids))}) """, archive_ids, ) if dry_run: logger.info("Dry-run finished.") db_conn.rollback() return 0 db_conn.commit() + for archive_id in archive_ids: + logger.info(f"Deleted archive {archive_id} from the database.")
289-290
: Improve handling of non-directory archive pathsWhen the archive path exists but is not a directory, it might indicate an unexpected file or link. Consider checking if the path exists at all and handle each case accordingly.
Apply this diff to enhance the error handling:
archive_path = archives_dir / archive_id + if not archive_path.exists(): + logger.warning(f"Archive {archive_id} does not exist on disk. Skipping deletion.") + continue if not archive_path.is_dir(): logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.") continue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/del_archives.py
(0 hunks)components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
(0 hunks)components/package-template/src/sbin/admin-tools/archive-manager.sh
(1 hunks)
💤 Files with no reviewable changes (2)
- components/clp-package-utils/clp_package_utils/scripts/del_archives.py
- components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
🧰 Additional context used
📓 Learnings (1)
components/package-template/src/sbin/admin-tools/archive-manager.sh (1)
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-latest)
- GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (4)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (2)
132-135
: LGTM! Clear storage type validation.The storage type validation is appropriate with a clear error message.
152-162
: LGTM! Container configuration is secure.The container configuration properly sets up the necessary mounts and uses a unique container name.
components/package-template/src/sbin/admin-tools/archive-manager.sh (1)
8-8
: LGTM! Module path updated correctly.The script correctly invokes the new archive manager module while maintaining proper Python path configuration.
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1)
321-325
: Verify compatibility of 'RETURNING' clause in DELETE statementsThe
'RETURNING'
clause used in DELETE statements at lines 324 and 352 may not be supported by all SQL databases. For example, MySQL versions prior to 8.0.19 do not support'RETURNING'
in DELETE statements, whereas PostgreSQL does.Please ensure that the target database supports the
'RETURNING'
clause with DELETE statements. If not, you may need to adjust the SQL queries accordingly.Also applies to: 349-353
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
help="Delete archives.", | ||
) | ||
|
||
# Find options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would prefer putting the options right after the subcommand subparsers. @kirkrodrigues any preference?
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
129-139
:⚠️ Potential issueEnhance error handling specificity
The bare except clause could mask important errors and make debugging more difficult.
Consider catching specific exceptions:
- except: + except (FileNotFoundError, ValueError, KeyError) as e: - logger.exception("Failed to load config.") + logger.exception(f"Failed to load config: {str(e)}") return -1
🧹 Nitpick comments (9)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5)
20-27
: Consider grouping related constants into enums or dataclassesThe command and argument constants could be better organized using Python's enum or dataclass to provide better type safety and grouping of related values.
Example refactor:
from enum import Enum, auto from dataclasses import dataclass class Command(Enum): FIND = "find" DELETE = "del" class DeleteSubCommand(Enum): BY_IDS = "by-ids" BY_FILTER = "by-filter" @dataclass(frozen=True) class Arguments: BEGIN_TS: str = "--begin-ts" END_TS: str = "--end-ts" DRY_RUN: str = "--dry-run"
32-40
: Enhance timestamp validation robustnessThe validation function could be more comprehensive and provide more specific error messages.
Consider this enhanced implementation:
def _validate_timestamps(begin_ts, end_ts): + if not isinstance(begin_ts, int) or not isinstance(end_ts, int): + logger.error("Timestamps must be integers") + return False if begin_ts > end_ts: - logger.error("begin-ts must be <= end-ts") + logger.error(f"begin-ts ({begin_ts}) must be <= end-ts ({end_ts})") return False if end_ts < 0 or begin_ts < 0: - logger.error("begin-ts and end-ts must be non-negative.") + logger.error(f"Invalid negative timestamp(s): begin-ts={begin_ts}, end-ts={end_ts}") return False return True
173-180
: Fix formatting directive alignmentThe fmt directives are misaligned, which could cause inconsistent formatting.
- # fmt: off +# fmt: off archive_manager_cmd = [ "python3", "-m", "clp_package_utils.scripts.native.archive_manager", "--config", str(generated_config_path_on_container), str(parsed_args.subcommand), ] - # fmt : on +# fmt: on
183-184
: Use Pythonic boolean comparisonThe boolean comparison is not following Python conventions.
- if True == parsed_args.dry_run: + if parsed_args.dry_run: archive_manager_cmd.append(DRY_RUN_ARG)
174-194
: Refactor command construction for better maintainabilityThe command construction logic is spread across multiple conditionals and uses a mix of append/extend calls. Consider using a command builder pattern for better organization.
Example refactor:
class ArchiveManagerCommandBuilder: def __init__(self, config_path, subcommand): self.cmd = [ "python3", "-m", "clp_package_utils.scripts.native.archive_manager", "--config", str(config_path), str(subcommand) ] def add_dry_run(self, enabled): if enabled: self.cmd.append(DRY_RUN_ARG) return self def add_ids(self, ids): self.cmd.append(BY_IDS_COMMAND) self.cmd.extend(ids) return self def add_filter(self, begin_ts, end_ts): self.cmd.extend([BY_FILTER_COMMAND, str(begin_ts), str(end_ts)]) return self def add_time_range(self, begin_ts, end_ts=None): self.cmd.extend([BEGIN_TS_ARG, str(begin_ts)]) if end_ts is not None: self.cmd.extend([END_TS_ARG, str(end_ts)]) return self def build(self): return self.cmdcomponents/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4)
91-91
: Fix typo in help text"delte" should be "delete".
- help="delte archives within time frame.", + help="delete archives within time frame.",
119-119
: Add archives directory validation during config loadingConsider adding
validate_archive_output_dir()
alongsidevalidate_logs_dir()
to ensure the archives directory exists during config validation, rather than checking it later.
161-169
: Improve function documentationThe docstring has the following issues:
- Contains a typo: "begin_its" should be "begin_ts"
- Missing return value documentation
- Missing parameter type hints in the documentation
""" - Lists all archive IDs, if begin_its and end_its are provided, + Lists all archive IDs. If begin_ts and end_ts are provided, only lists archives where `begin_ts <= archive.begin_timestamp` and `archive.end_timestamp <= end_ts`. - :param archives_dir: - :param database_config: - :param begin_ts: - :param end_ts: + :param archives_dir: Path to the archives directory + :param database_config: Database configuration object + :param begin_ts: Start timestamp in milliseconds from epoch + :param end_ts: End timestamp in milliseconds from epoch + :return: 0 on success, -1 on failure """
200-200
: Fix misleading error messageThe error message mentions "Aborting deletion" in the find function, which doesn't perform any deletions.
- logger.exception("Failed to find archives from the database. Aborting deletion.") + logger.exception("Failed to find archives from the database.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build (macos-latest)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
364-365
: LGTM!The main guard is properly implemented.
268-278
: Use parameterized queries for table namesThe current implementation uses string formatting for table names, which could potentially lead to SQL injection if the table prefix comes from user input. Consider using a safer approach by validating the table prefix format or using your SQL adapter's built-in table name quoting mechanism.
Run this script to check if table prefix validation exists:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4)
91-91
: Fix typo in help text"delte" should be "delete"
- help="delte archives within time frame.", + help="delete archives within time frame.",
161-169
: Complete the docstring with parameter descriptionsThe docstring is missing descriptions for all parameters and the return value.
""" Lists all archive IDs, if begin_its and end_its are provided, only lists archives where `begin_ts <= archive.begin_timestamp` and `archive.end_timestamp <= end_ts`. - :param archives_dir: - :param database_config: - :param begin_ts: - :param end_ts: + :param archives_dir: Path to the archives directory + :param database_config: Database configuration object + :param begin_ts: Start timestamp in milliseconds since epoch + :param end_ts: Optional end timestamp in milliseconds since epoch + :return: 0 on success, -1 on failure """
216-224
: Update docstring to match function behaviourThe docstring seems to be copied from
delete_archives_by_filter
and doesn't accurately describe this function's purpose.""" - Deletes all archives where `begin_ts <= archive.begin_timestamp` and - `archive.end_timestamp <= end_ts` from both the metadata database and disk. + Deletes archives from both the metadata database and disk based on the provided SQL query. :param archives_dir: :param database_config: :param query: SQL query to select archives to delete. :param params: List of parameters for SQL query. + :param criteria: Type of deletion ('filter' or 'ids') for logging purposes + :param dry_run: If True, simulates deletion without making changes :return: 0 on success, -1 otherwise. """
355-360
: Factor out SQL placeholder generationThe SQL placeholder generation could be factored out into a variable for better readability.
+ placeholders = ', '.join(['%s'] * len(archive_ids)) query = f""" DELETE FROM `{table_prefix}archives` - WHERE id in ({', '.join(['%s'] * len(archive_ids))}) + WHERE id in ({placeholders}) RETURNING id """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
306-336
: Implementation looks good!The function is well-structured with proper SQL parameter binding and clear error handling.
364-365
: Main entry point looks good!The implementation follows the standard Python idiom for script execution.
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
database_config: Database, | ||
query: str, | ||
params: List[str], | ||
criteria: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can name "criteria" as "command" instead, and refer to the defined constant?
We should also check whether the input command to this method is one of the expected value, and throw an error if it is not.
For params. My personal feeling is that it's tied to criteria, and passing it in as a separate argument could be confusing. What I can think of is to introduce a class that wraps around the param and criteria, but it might be an overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First point done.
I'm not sure what you mean by the second point. If you mean to validate the commands/subcommands, I think the argparser should handle it already?
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
122-127
:⚠️ Potential issueSpecify exception types instead of using bare except.
Using a bare except clause can mask important exceptions like KeyboardInterrupt. Consider catching specific exceptions that might occur during config loading.
try: clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) clp_config.validate_logs_dir() - except: + except (FileNotFoundError, ValueError, KeyError) as e: logger.exception("Failed to load config.") return -1
295-297
:⚠️ Potential issueMove logging after successful deletion.
The function logs that it deleted an archive from the database before the deletion is actually committed. This could be misleading if the operation fails.
- for archive_id in archive_ids: - logger.info(f"Deleted archive {archive_id} from the database.") if dry_run: logger.info("Dry-run finished.") db_conn.rollback() return 0 db_conn.commit() + for archive_id in archive_ids: + logger.info(f"Deleted archive {archive_id} from the database.")
🧹 Nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
18-26
: Add docstring to describe the module's purpose and constants.Consider adding a module-level docstring to describe the purpose of this module and document the command/argument constants. This will improve code maintainability and help other developers understand the module's functionality.
Add this docstring at the beginning of the file:
+""" +Archive management utility that provides functionality to find and delete archives. + +Commands: + find: List archive IDs with optional time-range filters + del: Delete archives by ID or time-range filter + +Constants: + FIND_COMMAND: Command to list archive IDs + DEL_COMMAND: Command to delete archives + BY_IDS_COMMAND: Subcommand to delete by IDs + BY_FILTER_COMMAND: Subcommand to delete by time filter + BEGIN_TS_ARG: Argument for begin timestamp + END_TS_ARG: Argument for end timestamp + DRY_RUN_ARG: Argument for dry-run mode +"""
161-175
: Improve function documentation with complete type hints and return value description.The docstring is missing parameter types, return value description, and has a typo in the parameter name.
def _find_archives( archives_dir: Path, database_config: Database, begin_ts: int, end_ts: int = None, ) -> int: """ - Lists all archive IDs, if begin_its and end_its are provided, + Lists all archive IDs. If begin_ts and end_ts are provided, only lists archives where `begin_ts <= archive.begin_timestamp` and `archive.end_timestamp <= end_ts`. - :param archives_dir: - :param database_config: - :param begin_ts: - :param end_ts: + :param archives_dir: Path to the archives directory + :param database_config: Database configuration object + :param begin_ts: Start timestamp in milliseconds from epoch + :param end_ts: End timestamp in milliseconds from epoch + :return: 0 on success, -1 on failure """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
""" | ||
Deletes archives from both metadata database and disk based on provided SQL query. | ||
:param archives_dir: | ||
:param database_config: | ||
:param query: SQL query to select archives to delete. | ||
:param params: List of parameters for SQL query. | ||
:param command: Delete subcommand. Either "filter" or "ids". | ||
:param dry_run: If True, no changes will be made to the database or disk. | ||
:return: 0 on success, -1 otherwise. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix docstring parameter list and add command validation.
The docstring has an incorrect parameter name and is missing command validation.
"""
Deletes archives from both metadata database and disk based on provided SQL query.
:param archives_dir:
:param database_config:
- :param query: SQL query to select archives to delete.
+ :param params: List of parameters for SQL query
+ :param command: Delete subcommand (must be either BY_IDS_COMMAND or BY_FILTER_COMMAND)
:param dry_run: If True, no changes will be made to the database or disk.
:return: 0 on success, -1 otherwise.
"""
+ if command not in [BY_IDS_COMMAND, BY_FILTER_COMMAND]:
+ logger.error(f"Invalid command: {command}")
+ return -1
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
""" | |
Deletes archives from both metadata database and disk based on provided SQL query. | |
:param archives_dir: | |
:param database_config: | |
:param query: SQL query to select archives to delete. | |
:param params: List of parameters for SQL query. | |
:param command: Delete subcommand. Either "filter" or "ids". | |
:param dry_run: If True, no changes will be made to the database or disk. | |
:return: 0 on success, -1 otherwise. | |
""" | |
""" | |
Deletes archives from both metadata database and disk based on provided SQL query. | |
:param archives_dir: | |
:param database_config: | |
:param params: List of parameters for SQL query | |
:param command: Delete subcommand (must be either BY_IDS_COMMAND or BY_FILTER_COMMAND) | |
:param dry_run: If True, no changes will be made to the database or disk. | |
:return: 0 on success, -1 otherwise. | |
""" | |
if command not in [BY_IDS_COMMAND, BY_FILTER_COMMAND]: | |
logger.error(f"Invalid command: {command}") | |
return -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
103-103
: Capitalize help text for consistency.The help text should start with a capital letter to maintain consistency with other help messages.
- help="delete archives within time frame.", + help="Delete archives within time frame.",
168-170
: Fix typo in docstring parameter name.The parameter name in the docstring has a typo: "begin_its" should be "begin_ts".
- Lists all archive IDs, if begin_its and end_its are provided, + Lists all archive IDs, if begin_ts and end_ts are provided,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build (macos-latest)
🔇 Additional comments (7)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (7)
1-28
: LGTM! Well-organized imports and constants.The imports are logically organized, and the constants follow Python naming conventions. As discussed in previous reviews, having the constants defined in this native script is the right approach.
326-327
: LGTM! Proper main guard implementation.The main guard is correctly implemented following Python best practices.
271-271
:⚠️ Potential issueUse constant for command comparison.
The command comparison should use the defined constant instead of a string literal.
- if "ids" == command: + if command == BY_IDS_COMMAND:Likely invalid or redundant comment.
122-127
:⚠️ Potential issueSpecify exception types instead of using bare except.
Using a bare except clause can mask important exceptions like KeyboardInterrupt.
try: clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) clp_config.validate_logs_dir() - except: + except (FileNotFoundError, ValueError, KeyError) as e: logger.exception("Failed to load config.") return -1Likely invalid or redundant comment.
296-298
: 🛠️ Refactor suggestionMove logging after successful deletion.
The function logs that it deleted archives from the database before the deletion is actually committed.
- for archive_id in archive_ids: - logger.info(f"Deleted archive {archive_id} from the database.") + if not dry_run: + db_conn.commit() + for archive_id in archive_ids: + logger.info(f"Deleted archive {archive_id} from the database.")Likely invalid or redundant comment.
224-233
:⚠️ Potential issueFix docstring parameter list and add command validation.
The docstring has incorrect parameter descriptions and is missing command validation.
""" Deletes archives from both metadata database and disk based on provided SQL query. :param archives_dir: :param database_config: - :param query: SQL query to select archives to delete. + :param params: List of parameters for SQL query + :param command: Delete subcommand (must be either BY_IDS_COMMAND or BY_FILTER_COMMAND) :param dry_run: If True, no changes will be made to the database or disk. :return: 0 on success, -1 otherwise. """ + if command not in [BY_IDS_COMMAND, BY_FILTER_COMMAND]: + logger.error(f"Invalid command: {command}") + return -1Likely invalid or redundant comment.
135-142
: 🛠️ Refactor suggestionUse constants for command comparison.
Use the defined constants instead of string literals for command comparison.
- if FIND_COMMAND == parsed_args.subcommand: + if parsed_args.subcommand == FIND_COMMAND:Likely invalid or redundant comment.
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Outdated
Show resolved
Hide resolved
ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids) | ||
|
||
db_cursor.execute( | ||
f""" | ||
DELETE FROM `{table_prefix}files` | ||
WHERE archive_id in ({ids_string}) | ||
""" | ||
) | ||
|
||
db_cursor.execute( | ||
f""" | ||
DELETE FROM `{table_prefix}archive_tags` | ||
WHERE archive_id in ({ids_string}) | ||
""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use parameterized queries to prevent SQL injection.
The current implementation concatenates archive IDs directly into the SQL query, which could be vulnerable to SQL injection. Use parameterized queries instead.
- ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids)
+ placeholders = ", ".join(["%s"] * len(archive_ids))
db_cursor.execute(
f"""
DELETE FROM `{table_prefix}files`
- WHERE archive_id in ({ids_string})
+ WHERE archive_id in ({placeholders})
- """
+ """,
+ archive_ids,
)
db_cursor.execute(
f"""
DELETE FROM `{table_prefix}archive_tags`
- WHERE archive_id in ({ids_string})
+ WHERE archive_id in ({placeholders})
- """
+ """,
+ archive_ids,
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids) | |
db_cursor.execute( | |
f""" | |
DELETE FROM `{table_prefix}files` | |
WHERE archive_id in ({ids_string}) | |
""" | |
) | |
db_cursor.execute( | |
f""" | |
DELETE FROM `{table_prefix}archive_tags` | |
WHERE archive_id in ({ids_string}) | |
""" | |
) | |
placeholders = ", ".join(["%s"] * len(archive_ids)) | |
db_cursor.execute( | |
f""" | |
DELETE FROM `{table_prefix}files` | |
WHERE archive_id in ({placeholders}) | |
""", | |
archive_ids, | |
) | |
db_cursor.execute( | |
f""" | |
DELETE FROM `{table_prefix}archive_tags` | |
WHERE archive_id in ({placeholders}) | |
""", | |
archive_ids, | |
) |
begin_timestamp = None | ||
end_timestamp = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
begin_timestamp = None | |
end_timestamp = None | |
begin_timestamp: int | |
end_timestamp: int |
They should get properly None initialized.
begin_timestamp = parsed_args.begin_ts | ||
end_timestamp = parsed_args.end_ts | ||
|
||
if begin_timestamp is not None or end_timestamp is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code again, begin_timestamp
has a default value, so it should never gets a None value?
the _validate_timestamps
is also written to handle None input, in this case, do we still need this if begin_timestamp is not None or end_timestamp is not None:
statement?
If we can remove this if statement, maybe we can also combine the two if-clause for ts validation
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
# Delete by filter subcommand | ||
del_filter_parser = del_subparsers.add_parser( | ||
BY_FILTER_COMMAND, | ||
help="delete archives within time frame.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use the text in the previous script
help="delete archives within time frame.", | |
help="Deletes archives that fall within the specified time range.", |
archive_path = archives_dir / archive_id | ||
if not archive_path.is_dir(): | ||
logger.warning(f"Archive {archive_id} in database not found on disk.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about putting this under line 200's for loop?
logger.info("Running in dry-run mode.") | ||
|
||
criteria: str | ||
if BY_FILTER_COMMAND == command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested in my previous comment, we should add a guard else
case if command is neither two.
Edited: this should no longer be an issue if we use the handler strategy
return 0 | ||
|
||
|
||
def _delete_archives( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for keep brining the same point up. When I read the code, I realized that we have several places where we have this if-else to check the command and that's kind of annoying.
I actually ran into a similar situation a while ago when I implemented another change, and finally came up with a relatively neat solution by creating custom handler class for both cases.
you can find it here. Basically, I created a handler class with method that can be overloaded.
let me know if you would like to simplify the code using this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'm okay with how it is right now but obviously I'm not the best authority on what's up to standard. I'm not entirely sure that adding the class would make things simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on yesterday's meeting discussion, let's use the class method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, can you explain it a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
48-57
: Consider adding validation for timestamp parameters.The FilterDeleteHandler's validate_results method is empty. Consider adding validation to ensure begin_timestamp is less than or equal to end_timestamp.
def validate_results(self, archive_ids: List[str]) -> None: - pass + begin_ts, end_ts = self._params + if int(begin_ts) > int(end_ts): + raise ValueError("begin_timestamp must be less than or equal to end_timestamp")
215-223
: Fix typo in docstring parameter name.The parameter name in the docstring uses "begin_its" instead of "begin_ts".
Lists all archive IDs, if begin_its and end_its are provided, - only lists archives where `begin_ts <= archive.begin_timestamp` and + only lists archives where `begin_ts <= archive.begin_timestamp` and `archive.end_timestamp <= end_ts`. :param archives_dir: :param database_config: :param begin_ts: :param end_ts:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build (macos-latest)
🔇 Additional comments (7)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (7)
1-28
: LGTM! Well-organized imports and constants.The imports are logically grouped and the constants follow proper naming conventions.
31-46
: LGTM! Well-designed abstract base class.The DeleteHandler abstract base class provides a clean interface for different deletion strategies.
354-355
: LGTM! Proper main guard implementation.The main guard follows Python best practices.
169-174
: 🛠️ Refactor suggestionSpecify exception types instead of using bare except.
Using a bare except clause can mask important exceptions like KeyboardInterrupt.
try: clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) clp_config.validate_logs_dir() - except: + except (FileNotFoundError, ValueError, KeyError) as e: logger.exception("Failed to load config.") return -1Likely invalid or redundant comment.
250-252
: 🛠️ Refactor suggestionAdd error handling for filesystem operations.
The filesystem check should handle potential permission issues when accessing archive directories.
- archive_path = archives_dir / archive_id - if not archive_path.is_dir(): - logger.warning(f"Archive {archive_id} in database not found on disk.") + try: + archive_path = archives_dir / archive_id + if not archive_path.is_dir(): + logger.warning(f"Archive {archive_id} in database not found on disk.") + except PermissionError: + logger.warning(f"Permission denied when checking archive {archive_id} on disk.")Likely invalid or redundant comment.
59-75
: 🛠️ Refactor suggestionUse parameterized queries to prevent SQL injection.
The current implementation in get_criteria concatenates values directly into the SQL query. While the current usage is safe, it's better to use parameterized queries consistently.
def get_criteria(self) -> str: - return f"id in ({','.join(['%s'] * len(self._params))})" + placeholders = ','.join(['%s'] * len(self._params)) + return f"id in ({placeholders})"Likely invalid or redundant comment.
309-323
:⚠️ Potential issueUse parameterized queries for archive_id deletion.
Direct string concatenation of archive_ids in SQL queries is vulnerable to SQL injection.
- ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids) + placeholders = ", ".join(["%s"] * len(archive_ids)) db_cursor.execute( f""" DELETE FROM `{table_prefix}files` - WHERE archive_id in ({ids_string}) + WHERE archive_id in ({placeholders}) - """ + """, + archive_ids ) db_cursor.execute( f""" DELETE FROM `{table_prefix}archive_tags` - WHERE archive_id in ({ids_string}) + WHERE archive_id in ({placeholders}) - """ + """, + archive_ids )Likely invalid or redundant comment.
logger.info(f"Deleting archive {archive_id} from disk.") | ||
shutil.rmtree(archive_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for filesystem operations.
The rmtree operation should handle potential filesystem errors.
logger.info(f"Deleting archive {archive_id} from disk.")
- shutil.rmtree(archive_path)
+ try:
+ shutil.rmtree(archive_path)
+ except (PermissionError, OSError) as e:
+ logger.error(f"Failed to delete archive {archive_id} from disk: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
logger.info(f"Deleting archive {archive_id} from disk.") | |
shutil.rmtree(archive_path) | |
logger.info(f"Deleting archive {archive_id} from disk.") | |
try: | |
shutil.rmtree(archive_path) | |
except (PermissionError, OSError) as e: | |
logger.error(f"Failed to delete archive {archive_id} from disk: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
55-57
: Document the empty validate_results method.Add a docstring explaining why this method is intentionally empty.
def validate_results(self, archive_ids: List[str]) -> None: + """ + No validation needed for filter-based deletion as all matching archives are valid targets. + """ pass
215-223
: Complete the docstring parameters.The docstring is missing parameter descriptions and return value documentation.
""" Lists all archive IDs, if begin_ts and end_ts are provided, only lists archives where `begin_ts <= archive.begin_timestamp` and `archive.end_timestamp <= end_ts`. - :param archives_dir: - :param database_config: - :param begin_ts: - :param end_ts: + :param archives_dir: Path to the archives directory + :param database_config: Database configuration object + :param begin_ts: Start timestamp in milliseconds since epoch + :param end_ts: Optional end timestamp in milliseconds since epoch + :return: 0 on success, -1 on failure """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: build (macos-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (6)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (6)
1-28
: Well-organized imports and constants!The code demonstrates good practices with properly organized imports and well-defined constants for commands and arguments.
354-355
: LGTM!Proper implementation of the Python main guard.
169-174
:⚠️ Potential issueSpecify exception types instead of using bare except.
Using a bare except clause can mask important exceptions like KeyboardInterrupt.
try: clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) clp_config.validate_logs_dir() - except: + except (FileNotFoundError, ValueError, KeyError) as e: logger.exception("Failed to load config.") return -1Likely invalid or redundant comment.
250-252
:⚠️ Potential issueAdd error handling for filesystem operations.
The filesystem check should handle potential permission issues.
- archive_path = archives_dir / archive_id - if not archive_path.is_dir(): - logger.warning(f"Archive {archive_id} in database not found on disk.") + try: + archive_path = archives_dir / archive_id + if not archive_path.is_dir(): + logger.warning(f"Archive {archive_id} in database not found on disk.") + except PermissionError: + logger.warning(f"Permission denied when checking archive {archive_id} on disk.")Likely invalid or redundant comment.
346-347
:⚠️ Potential issueAdd error handling for filesystem operations.
The rmtree operation should handle potential filesystem errors.
logger.info(f"Deleting archive {archive_id} from disk.") - shutil.rmtree(archive_path) + try: + shutil.rmtree(archive_path) + except (PermissionError, OSError) as e: + logger.error(f"Failed to delete archive {archive_id} from disk: {e}")Likely invalid or redundant comment.
309-323
:⚠️ Potential issueUse parameterized queries to prevent SQL injection.
The current implementation concatenates archive IDs directly into the SQL query.
- ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids) + placeholders = ", ".join(["%s"] * len(archive_ids)) db_cursor.execute( f""" DELETE FROM `{table_prefix}files` - WHERE archive_id in ({ids_string}) + WHERE archive_id in ({placeholders}) - """ + """, + archive_ids, ) db_cursor.execute( f""" DELETE FROM `{table_prefix}archive_tags` - WHERE archive_id in ({ids_string}) + WHERE archive_id in ({placeholders}) - """ + """, + archive_ids, )Likely invalid or redundant comment.
|
||
class IdDeleteHandler(DeleteHandler): | ||
def get_criteria(self) -> str: | ||
return f"id in ({','.join(['%s'] * len(self._params))})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use parameterized query placeholders to prevent SQL injection.
The string formatting in get_criteria() could be vulnerable to SQL injection if the params contain malicious input.
- return f"id in ({','.join(['%s'] * len(self._params))})"
+ placeholders = ','.join(['%s'] * len(self._params))
+ return f"id in ({placeholders})"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return f"id in ({','.join(['%s'] * len(self._params))})" | |
placeholders = ','.join(['%s'] * len(self._params)) | |
return f"id in ({placeholders})" |
Description
This PR builds on the tools added in #594 to delete archives from both disk storage and CLP's metadatabase, addressing issue #597. The del_archives script has been modified to give the user the ability to view archive IDs with the option to filter based on timestamps, as well as the ability to delete specific archives based on their IDs. These options add onto the capability introduced in #594 to delete archives that fall within a desired time range. The tool was renamed to archive_manager to better reflect its increased scope.
The script has the following options:
The delete command has the following options:
Usage of the script may look like the following:
./archive-manager.sh find
./archive-manager.sh find --begin-ts <time_1> --end-ts <time_2>
./archive-manager.sh del by-filter --begin-ts <time_1> --end-ts <time_2>
./archive-manager.sh del by-ids <id_1> <id_2> <id_3> ...
Validation performed
Summary by CodeRabbit
New Features
Bug Fixes
Refactor