Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(clp-package): Rework del_archives script to also find and delete archives by ID. #665

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

Eden-D-Zhang
Copy link
Contributor

@Eden-D-Zhang Eden-D-Zhang commented Jan 15, 2025

Description

This PR builds on the tools added in #594 to delete archives from both disk storage and CLP's metadatabase, addressing issue #597. The del_archives script has been modified to give the user the ability to view archive IDs with the option to filter based on timestamps, as well as the ability to delete specific archives based on their IDs. These options add onto the capability introduced in #594 to delete archives that fall within a desired time range. The tool was renamed to archive_manager to better reflect its increased scope.

The script has the following options:

Command Arguments Subcommands Behaviour
find --begin-ts, --end-ts List archives within time range (all if none provided).
del --dry-run by-ids, by-filter Delete archives matching arguments. If dry-run is specified, the process is followed but no modifications are made.

The delete command has the following options:

Subcommand Arguments Behaviour
by-ids ids (List) Deletes listed archives, invalid IDs are skipped with a warning.
by-filter --begin-ts, --end-ts Deletes archives that fall between provided timestamps.

Usage of the script may look like the following:

  1. ./archive-manager.sh find
  2. ./archive-manager.sh find --begin-ts <time_1> --end-ts <time_2>
  3. ./archive-manager.sh del by-filter --begin-ts <time_1> --end-ts <time_2>
  4. ./archive-manager.sh del by-ids <id_1> <id_2> <id_3> ...

Validation performed

  • Find and delete operations were tested on manually compressed archives.
  • Delete by-ids operations were tested on multiple IDs at once, including invalid IDs.
  • Find operations were tested with and without different begin-ts and end-ts passed.

Summary by CodeRabbit

  • New Features

    • Introduced a comprehensive archive management utility with capabilities to find and delete archives.
    • Added command-line interface for archive management with timestamp-based filtering.
    • Enhanced archive deletion functionality with support for multiple deletion methods.
  • Bug Fixes

    • Improved error handling and logging for archive management operations.
  • Refactor

    • Replaced existing archive deletion script with a more robust archive management solution.
    • Updated admin tools to use the new archive management script.

Copy link
Contributor

coderabbitai bot commented Jan 15, 2025

Walkthrough

The pull request introduces a new archive_manager.py script for managing archives, replacing the previous del_archives.py script. This new utility allows users to find and delete archives via command-line arguments, incorporating features such as timestamp validation and error handling. The native implementation of the archive manager supports database querying and deletion capabilities. Additionally, the shell script for archive management has been updated to invoke the new module.

Changes

File Change Summary
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py New script for managing archives with command-line utility for finding and deleting archives. Includes timestamp validation and error handling.
components/clp-package-utils/clp_package_utils/scripts/del_archives.py Removed script; functionality replaced by new archive_manager.py.
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py New native implementation of archive management with database querying and deletion capabilities.
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py Removed script; functionality replaced by new archive_manager.py.
components/package-template/src/sbin/admin-tools/archive-manager.sh Updated to invoke new archive_manager module instead of del_archives.

Sequence Diagram

sequenceDiagram
    participant CLI as Command Line Interface
    participant AM as Archive Manager
    participant DB as Database
    participant FS as File System

    CLI->>AM: Invoke with arguments
    AM->>AM: Validate configuration
    AM->>AM: Parse timestamps
    AM->>DB: Query archives
    DB-->>AM: Return archive IDs
    alt Delete by ID
        AM->>DB: Delete archive records
        AM->>FS: Remove archive files
    else Delete by Time Range
        AM->>DB: Delete records in time range
        AM->>FS: Remove corresponding files
    end
    AM-->>CLI: Return operation status
Loading

Possibly related PRs

Suggested reviewers

  • kirkrodrigues
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kirkrodrigues kirkrodrigues requested a review from haiqi96 January 15, 2025 21:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (9)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5)

23-30: LGTM! Consider enhancing error messages.

The timestamp validation logic is correct and handles both ordering and non-negativity checks appropriately.

Consider adding the actual timestamp values to the error messages for better debugging:

-        logger.error("begin-ts must be <= end-ts")
+        logger.error(f"begin-ts ({begin_ts}) must be <= end-ts ({end_ts})")

37-116: LGTM! Consider clarifying dry-run behaviour.

The command-line interface is well-structured with clear subcommands and appropriate argument validation.

Consider enhancing the dry-run help message to specify the exact output format:

-        help="Preview delete without making changes. Lists errors and files to be deleted.",
+        help="Preview delete without making changes. Lists archive IDs and paths that would be deleted.",

138-151: LGTM! Consider refactoring for better readability.

The validation logic is correct but could be structured more clearly.

Consider extracting the validation into a separate function:

def validate_filter_timestamps(args):
    if hasattr(args, "end_ts") and args.end_ts is not None:
        return validate_timestamps(args.begin_ts, args.end_ts)
    return True

164-185: LGTM! Consider using list extension for cleaner code.

The command construction correctly handles all arguments and subcommands.

Consider using list extension for cleaner command construction:

base_cmd = [
    "python3",
    "-m", "clp_package_utils.scripts.native.archive_manager",
    "--config", str(generated_config_path_on_container),
    str(parsed_args.subcommand),
]

if parsed_args.subcommand == "del":
    cmd_args = ["--dry-run"] if parsed_args.dry_run else []
    # ... rest of the command construction

187-194: LGTM! Consider enhancing error handling.

The command execution and cleanup are properly implemented.

Consider adding try-except for subprocess execution and cleanup:

try:
    subprocess.run(cmd, check=True)
except subprocess.CalledProcessError as e:
    logger.error(f"Command failed with exit code {e.returncode}")
    return e.returncode
finally:
    if generated_config_path_on_host.exists():
        generated_config_path_on_host.unlink()
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4)

82-82: Typo in help text for 'by-filter' subcommand

There's a typo in the help text for the 'by-filter' subcommand: 'delte archives within time frame.' It should be 'delete archives within time frame.'

Apply this diff to fix the typo:

-help="delte archives within time frame.",
+help="delete archives within time frame.",

156-158: Correct parameter names in the docstring

In the docstring of the _find_archives function, the parameters 'begin_its' and 'end_its' should be corrected to 'begin_ts' and 'end_ts' to match the actual parameter names.

Apply this diff to fix the typos:

-        Lists all archive IDs, if begin_its and end_its are provided,
+        Lists all archive IDs, if begin_ts and end_ts are provided,

246-247: Log deletion messages after committing the transaction

The log message indicating that an archive was deleted from the database is currently logged before the transaction is committed. To ensure accurate logging, consider moving this message after the successful commit of the transaction.

Apply this diff to adjust the logging:

-                for archive_id in archive_ids:
-                    logger.info(f"Deleted archive {archive_id} from the database.")

                 if "ids" == criteria:
                     not_found_ids = set(params) - set(archive_ids)
                     if not_found_ids:
                         logger.warning(
                             f"""
                             Failed to find archives with the following IDs:
                             {', '.join(not_found_ids)}
                             """
                         )

                 db_cursor.execute(
                     f"""
                     DELETE FROM `{table_prefix}files`
                     WHERE archive_id in ({', '.join(['%s'] * len(archive_ids))})
                     """,
                     archive_ids,
                 )

                 db_cursor.execute(
                     f"""
                     DELETE FROM `{table_prefix}archive_tags`
                     WHERE archive_id in ({', '.join(['%s'] * len(archive_ids))})
                     """,
                     archive_ids,
                 )

                 if dry_run:
                     logger.info("Dry-run finished.")
                     db_conn.rollback()
                     return 0

                 db_conn.commit()

+                for archive_id in archive_ids:
+                    logger.info(f"Deleted archive {archive_id} from the database.")

289-290: Improve handling of non-directory archive paths

When the archive path exists but is not a directory, it might indicate an unexpected file or link. Consider checking if the path exists at all and handle each case accordingly.

Apply this diff to enhance the error handling:

     archive_path = archives_dir / archive_id
+    if not archive_path.exists():
+        logger.warning(f"Archive {archive_id} does not exist on disk. Skipping deletion.")
+        continue
     if not archive_path.is_dir():
         logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.")
         continue
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ecd9c7 and c210992.

📒 Files selected for processing (5)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/del_archives.py (0 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py (0 hunks)
  • components/package-template/src/sbin/admin-tools/archive-manager.sh (1 hunks)
💤 Files with no reviewable changes (2)
  • components/clp-package-utils/clp_package_utils/scripts/del_archives.py
  • components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
🧰 Additional context used
📓 Learnings (1)
components/package-template/src/sbin/admin-tools/archive-manager.sh (1)
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (4)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (2)

132-135: LGTM! Clear storage type validation.

The storage type validation is appropriate with a clear error message.


152-162: LGTM! Container configuration is secure.

The container configuration properly sets up the necessary mounts and uses a unique container name.

components/package-template/src/sbin/admin-tools/archive-manager.sh (1)

8-8: LGTM! Module path updated correctly.

The script correctly invokes the new archive manager module while maintaining proper Python path configuration.

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1)

321-325: Verify compatibility of 'RETURNING' clause in DELETE statements

The 'RETURNING' clause used in DELETE statements at lines 324 and 352 may not be supported by all SQL databases. For example, MySQL versions prior to 8.0.19 do not support 'RETURNING' in DELETE statements, whereas PostgreSQL does.

Please ensure that the target database supports the 'RETURNING' clause with DELETE statements. If not, you may need to adjust the SQL queries accordingly.

Also applies to: 349-353

help="Delete archives.",
)

# Find options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would prefer putting the options right after the subcommand subparsers. @kirkrodrigues any preference?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)

129-139: ⚠️ Potential issue

Enhance error handling specificity

The bare except clause could mask important errors and make debugging more difficult.

Consider catching specific exceptions:

-    except:
+    except (FileNotFoundError, ValueError, KeyError) as e:
-        logger.exception("Failed to load config.")
+        logger.exception(f"Failed to load config: {str(e)}")
         return -1
🧹 Nitpick comments (9)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5)

20-27: Consider grouping related constants into enums or dataclasses

The command and argument constants could be better organized using Python's enum or dataclass to provide better type safety and grouping of related values.

Example refactor:

from enum import Enum, auto
from dataclasses import dataclass

class Command(Enum):
    FIND = "find"
    DELETE = "del"

class DeleteSubCommand(Enum):
    BY_IDS = "by-ids"
    BY_FILTER = "by-filter"

@dataclass(frozen=True)
class Arguments:
    BEGIN_TS: str = "--begin-ts"
    END_TS: str = "--end-ts"
    DRY_RUN: str = "--dry-run"

32-40: Enhance timestamp validation robustness

The validation function could be more comprehensive and provide more specific error messages.

Consider this enhanced implementation:

 def _validate_timestamps(begin_ts, end_ts):
+    if not isinstance(begin_ts, int) or not isinstance(end_ts, int):
+        logger.error("Timestamps must be integers")
+        return False
     if begin_ts > end_ts:
-        logger.error("begin-ts must be <= end-ts")
+        logger.error(f"begin-ts ({begin_ts}) must be <= end-ts ({end_ts})")
         return False
     if end_ts < 0 or begin_ts < 0:
-        logger.error("begin-ts and end-ts must be non-negative.")
+        logger.error(f"Invalid negative timestamp(s): begin-ts={begin_ts}, end-ts={end_ts}")
         return False
     return True

173-180: Fix formatting directive alignment

The fmt directives are misaligned, which could cause inconsistent formatting.

-    # fmt: off
+# fmt: off
     archive_manager_cmd = [
         "python3",
         "-m", "clp_package_utils.scripts.native.archive_manager",
         "--config", str(generated_config_path_on_container),
         str(parsed_args.subcommand),
     ]
-    # fmt : on
+# fmt: on

183-184: Use Pythonic boolean comparison

The boolean comparison is not following Python conventions.

-        if True == parsed_args.dry_run:
+        if parsed_args.dry_run:
             archive_manager_cmd.append(DRY_RUN_ARG)

174-194: Refactor command construction for better maintainability

The command construction logic is spread across multiple conditionals and uses a mix of append/extend calls. Consider using a command builder pattern for better organization.

Example refactor:

class ArchiveManagerCommandBuilder:
    def __init__(self, config_path, subcommand):
        self.cmd = [
            "python3",
            "-m", "clp_package_utils.scripts.native.archive_manager",
            "--config", str(config_path),
            str(subcommand)
        ]

    def add_dry_run(self, enabled):
        if enabled:
            self.cmd.append(DRY_RUN_ARG)
        return self

    def add_ids(self, ids):
        self.cmd.append(BY_IDS_COMMAND)
        self.cmd.extend(ids)
        return self

    def add_filter(self, begin_ts, end_ts):
        self.cmd.extend([BY_FILTER_COMMAND, str(begin_ts), str(end_ts)])
        return self

    def add_time_range(self, begin_ts, end_ts=None):
        self.cmd.extend([BEGIN_TS_ARG, str(begin_ts)])
        if end_ts is not None:
            self.cmd.extend([END_TS_ARG, str(end_ts)])
        return self

    def build(self):
        return self.cmd
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4)

91-91: Fix typo in help text

"delte" should be "delete".

-        help="delte archives within time frame.",
+        help="delete archives within time frame.",

119-119: Add archives directory validation during config loading

Consider adding validate_archive_output_dir() alongside validate_logs_dir() to ensure the archives directory exists during config validation, rather than checking it later.


161-169: Improve function documentation

The docstring has the following issues:

  1. Contains a typo: "begin_its" should be "begin_ts"
  2. Missing return value documentation
  3. Missing parameter type hints in the documentation
    """
-   Lists all archive IDs, if begin_its and end_its are provided,
+   Lists all archive IDs. If begin_ts and end_ts are provided,
    only lists archives where `begin_ts <= archive.begin_timestamp` and
    `archive.end_timestamp <= end_ts`.
-   :param archives_dir:
-   :param database_config:
-   :param begin_ts:
-   :param end_ts:
+   :param archives_dir: Path to the archives directory
+   :param database_config: Database configuration object
+   :param begin_ts: Start timestamp in milliseconds from epoch
+   :param end_ts: End timestamp in milliseconds from epoch
+   :return: 0 on success, -1 on failure
    """

200-200: Fix misleading error message

The error message mentions "Aborting deletion" in the find function, which doesn't perform any deletions.

-        logger.exception("Failed to find archives from the database. Aborting deletion.")
+        logger.exception("Failed to find archives from the database.")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c210992 and 812ce45.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

364-365: LGTM!

The main guard is properly implemented.


268-278: Use parameterized queries for table names

The current implementation uses string formatting for table names, which could potentially lead to SQL injection if the table prefix comes from user input. Consider using a safer approach by validating the table prefix format or using your SQL adapter's built-in table name quoting mechanism.

Run this script to check if table prefix validation exists:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4)

91-91: Fix typo in help text

"delte" should be "delete"

-        help="delte archives within time frame.",
+        help="delete archives within time frame.",

161-169: Complete the docstring with parameter descriptions

The docstring is missing descriptions for all parameters and the return value.

     """
     Lists all archive IDs, if begin_its and end_its are provided,
     only lists archives where `begin_ts <= archive.begin_timestamp` and
     `archive.end_timestamp <= end_ts`.
-    :param archives_dir:
-    :param database_config:
-    :param begin_ts:
-    :param end_ts:
+    :param archives_dir: Path to the archives directory
+    :param database_config: Database configuration object
+    :param begin_ts: Start timestamp in milliseconds since epoch
+    :param end_ts: Optional end timestamp in milliseconds since epoch
+    :return: 0 on success, -1 on failure
     """

216-224: Update docstring to match function behaviour

The docstring seems to be copied from delete_archives_by_filter and doesn't accurately describe this function's purpose.

     """
-    Deletes all archives where `begin_ts <= archive.begin_timestamp` and
-    `archive.end_timestamp <= end_ts` from both the metadata database and disk.
+    Deletes archives from both the metadata database and disk based on the provided SQL query.
     :param archives_dir:
     :param database_config:
     :param query: SQL query to select archives to delete.
     :param params: List of parameters for SQL query.
+    :param criteria: Type of deletion ('filter' or 'ids') for logging purposes
+    :param dry_run: If True, simulates deletion without making changes
     :return: 0 on success, -1 otherwise.
     """

355-360: Factor out SQL placeholder generation

The SQL placeholder generation could be factored out into a variable for better readability.

+    placeholders = ', '.join(['%s'] * len(archive_ids))
     query = f"""
         DELETE FROM `{table_prefix}archives`
-        WHERE id in ({', '.join(['%s'] * len(archive_ids))})
+        WHERE id in ({placeholders})
         RETURNING id
         """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 812ce45 and 08b8f80.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

306-336: Implementation looks good!

The function is well-structured with proper SQL parameter binding and clear error handling.


364-365: Main entry point looks good!

The implementation follows the standard Python idiom for script execution.

@Eden-D-Zhang Eden-D-Zhang requested a review from haiqi96 January 16, 2025 19:35
database_config: Database,
query: str,
params: List[str],
criteria: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can name "criteria" as "command" instead, and refer to the defined constant?

We should also check whether the input command to this method is one of the expected value, and throw an error if it is not.

For params. My personal feeling is that it's tied to criteria, and passing it in as a separate argument could be confusing. What I can think of is to introduce a class that wraps around the param and criteria, but it might be an overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First point done.
I'm not sure what you mean by the second point. If you mean to validate the commands/subcommands, I think the argparser should handle it already?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

122-127: ⚠️ Potential issue

Specify exception types instead of using bare except.

Using a bare except clause can mask important exceptions like KeyboardInterrupt. Consider catching specific exceptions that might occur during config loading.

     try:
         clp_config = load_config_file(config_file_path, default_config_file_path, clp_home)
         clp_config.validate_logs_dir()
-    except:
+    except (FileNotFoundError, ValueError, KeyError) as e:
         logger.exception("Failed to load config.")
         return -1

295-297: ⚠️ Potential issue

Move logging after successful deletion.

The function logs that it deleted an archive from the database before the deletion is actually committed. This could be misleading if the operation fails.

-            for archive_id in archive_ids:
-                logger.info(f"Deleted archive {archive_id} from the database.")

             if dry_run:
                 logger.info("Dry-run finished.")
                 db_conn.rollback()
                 return 0

             db_conn.commit()
+            for archive_id in archive_ids:
+                logger.info(f"Deleted archive {archive_id} from the database.")
🧹 Nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

18-26: Add docstring to describe the module's purpose and constants.

Consider adding a module-level docstring to describe the purpose of this module and document the command/argument constants. This will improve code maintainability and help other developers understand the module's functionality.

Add this docstring at the beginning of the file:

+"""
+Archive management utility that provides functionality to find and delete archives.
+
+Commands:
+    find: List archive IDs with optional time-range filters
+    del: Delete archives by ID or time-range filter
+
+Constants:
+    FIND_COMMAND: Command to list archive IDs
+    DEL_COMMAND: Command to delete archives
+    BY_IDS_COMMAND: Subcommand to delete by IDs
+    BY_FILTER_COMMAND: Subcommand to delete by time filter
+    BEGIN_TS_ARG: Argument for begin timestamp
+    END_TS_ARG: Argument for end timestamp
+    DRY_RUN_ARG: Argument for dry-run mode
+"""

161-175: Improve function documentation with complete type hints and return value description.

The docstring is missing parameter types, return value description, and has a typo in the parameter name.

 def _find_archives(
     archives_dir: Path,
     database_config: Database,
     begin_ts: int,
     end_ts: int = None,
 ) -> int:
     """
-    Lists all archive IDs, if begin_its and end_its are provided,
+    Lists all archive IDs. If begin_ts and end_ts are provided,
     only lists archives where `begin_ts <= archive.begin_timestamp` and
     `archive.end_timestamp <= end_ts`.
-    :param archives_dir:
-    :param database_config:
-    :param begin_ts:
-    :param end_ts:
+    :param archives_dir: Path to the archives directory
+    :param database_config: Database configuration object
+    :param begin_ts: Start timestamp in milliseconds from epoch
+    :param end_ts: End timestamp in milliseconds from epoch
+    :return: 0 on success, -1 on failure
     """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08b8f80 and 2b841fa.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)

Comment on lines 224 to 233
"""
Deletes archives from both metadata database and disk based on provided SQL query.
:param archives_dir:
:param database_config:
:param query: SQL query to select archives to delete.
:param params: List of parameters for SQL query.
:param command: Delete subcommand. Either "filter" or "ids".
:param dry_run: If True, no changes will be made to the database or disk.
:return: 0 on success, -1 otherwise.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix docstring parameter list and add command validation.

The docstring has an incorrect parameter name and is missing command validation.

     """
     Deletes archives from both metadata database and disk based on provided SQL query.
     :param archives_dir:
     :param database_config:
-    :param query: SQL query to select archives to delete.
+    :param params: List of parameters for SQL query
+    :param command: Delete subcommand (must be either BY_IDS_COMMAND or BY_FILTER_COMMAND)
     :param dry_run: If True, no changes will be made to the database or disk.
     :return: 0 on success, -1 otherwise.
     """
+    if command not in [BY_IDS_COMMAND, BY_FILTER_COMMAND]:
+        logger.error(f"Invalid command: {command}")
+        return -1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""
Deletes archives from both metadata database and disk based on provided SQL query.
:param archives_dir:
:param database_config:
:param query: SQL query to select archives to delete.
:param params: List of parameters for SQL query.
:param command: Delete subcommand. Either "filter" or "ids".
:param dry_run: If True, no changes will be made to the database or disk.
:return: 0 on success, -1 otherwise.
"""
"""
Deletes archives from both metadata database and disk based on provided SQL query.
:param archives_dir:
:param database_config:
:param params: List of parameters for SQL query
:param command: Delete subcommand (must be either BY_IDS_COMMAND or BY_FILTER_COMMAND)
:param dry_run: If True, no changes will be made to the database or disk.
:return: 0 on success, -1 otherwise.
"""
if command not in [BY_IDS_COMMAND, BY_FILTER_COMMAND]:
logger.error(f"Invalid command: {command}")
return -1

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

103-103: Capitalize help text for consistency.

The help text should start with a capital letter to maintain consistency with other help messages.

-        help="delete archives within time frame.",
+        help="Delete archives within time frame.",

168-170: Fix typo in docstring parameter name.

The parameter name in the docstring has a typo: "begin_its" should be "begin_ts".

-    Lists all archive IDs, if begin_its and end_its are provided,
+    Lists all archive IDs, if begin_ts and end_ts are provided,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b841fa and 89299ca.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (7)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (7)

1-28: LGTM! Well-organized imports and constants.

The imports are logically organized, and the constants follow Python naming conventions. As discussed in previous reviews, having the constants defined in this native script is the right approach.


326-327: LGTM! Proper main guard implementation.

The main guard is correctly implemented following Python best practices.


271-271: ⚠️ Potential issue

Use constant for command comparison.

The command comparison should use the defined constant instead of a string literal.

-            if "ids" == command:
+            if command == BY_IDS_COMMAND:

Likely invalid or redundant comment.


122-127: ⚠️ Potential issue

Specify exception types instead of using bare except.

Using a bare except clause can mask important exceptions like KeyboardInterrupt.

     try:
         clp_config = load_config_file(config_file_path, default_config_file_path, clp_home)
         clp_config.validate_logs_dir()
-    except:
+    except (FileNotFoundError, ValueError, KeyError) as e:
         logger.exception("Failed to load config.")
         return -1

Likely invalid or redundant comment.


296-298: 🛠️ Refactor suggestion

Move logging after successful deletion.

The function logs that it deleted archives from the database before the deletion is actually committed.

-            for archive_id in archive_ids:
-                logger.info(f"Deleted archive {archive_id} from the database.")
+            if not dry_run:
+                db_conn.commit()
+                for archive_id in archive_ids:
+                    logger.info(f"Deleted archive {archive_id} from the database.")

Likely invalid or redundant comment.


224-233: ⚠️ Potential issue

Fix docstring parameter list and add command validation.

The docstring has incorrect parameter descriptions and is missing command validation.

     """
     Deletes archives from both metadata database and disk based on provided SQL query.
     :param archives_dir:
     :param database_config:
-    :param query: SQL query to select archives to delete.
+    :param params: List of parameters for SQL query
+    :param command: Delete subcommand (must be either BY_IDS_COMMAND or BY_FILTER_COMMAND)
     :param dry_run: If True, no changes will be made to the database or disk.
     :return: 0 on success, -1 otherwise.
     """
+    if command not in [BY_IDS_COMMAND, BY_FILTER_COMMAND]:
+        logger.error(f"Invalid command: {command}")
+        return -1

Likely invalid or redundant comment.


135-142: 🛠️ Refactor suggestion

Use constants for command comparison.

Use the defined constants instead of string literals for command comparison.

-    if FIND_COMMAND == parsed_args.subcommand:
+    if parsed_args.subcommand == FIND_COMMAND:

Likely invalid or redundant comment.

Comment on lines +281 to +295
ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids)

db_cursor.execute(
f"""
DELETE FROM `{table_prefix}files`
WHERE archive_id in ({ids_string})
"""
)

db_cursor.execute(
f"""
DELETE FROM `{table_prefix}archive_tags`
WHERE archive_id in ({ids_string})
"""
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use parameterized queries to prevent SQL injection.

The current implementation concatenates archive IDs directly into the SQL query, which could be vulnerable to SQL injection. Use parameterized queries instead.

-            ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids)
+            placeholders = ", ".join(["%s"] * len(archive_ids))
 
             db_cursor.execute(
                 f"""
                 DELETE FROM `{table_prefix}files`
-                WHERE archive_id in ({ids_string})
+                WHERE archive_id in ({placeholders})
-                """
+                """,
+                archive_ids,
             )
 
             db_cursor.execute(
                 f"""
                 DELETE FROM `{table_prefix}archive_tags`
-                WHERE archive_id in ({ids_string})
+                WHERE archive_id in ({placeholders})
-                """
+                """,
+                archive_ids,
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids)
db_cursor.execute(
f"""
DELETE FROM `{table_prefix}files`
WHERE archive_id in ({ids_string})
"""
)
db_cursor.execute(
f"""
DELETE FROM `{table_prefix}archive_tags`
WHERE archive_id in ({ids_string})
"""
)
placeholders = ", ".join(["%s"] * len(archive_ids))
db_cursor.execute(
f"""
DELETE FROM `{table_prefix}files`
WHERE archive_id in ({placeholders})
""",
archive_ids,
)
db_cursor.execute(
f"""
DELETE FROM `{table_prefix}archive_tags`
WHERE archive_id in ({placeholders})
""",
archive_ids,
)

@Eden-D-Zhang Eden-D-Zhang requested a review from haiqi96 January 22, 2025 18:23
Comment on lines 138 to 139
begin_timestamp = None
end_timestamp = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
begin_timestamp = None
end_timestamp = None
begin_timestamp: int
end_timestamp: int

They should get properly None initialized.

begin_timestamp = parsed_args.begin_ts
end_timestamp = parsed_args.end_ts

if begin_timestamp is not None or end_timestamp is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code again, begin_timestamp has a default value, so it should never gets a None value?

the _validate_timestamps is also written to handle None input, in this case, do we still need this if begin_timestamp is not None or end_timestamp is not None: statement?

If we can remove this if statement, maybe we can also combine the two if-clause for ts validation

# Delete by filter subcommand
del_filter_parser = del_subparsers.add_parser(
BY_FILTER_COMMAND,
help="delete archives within time frame.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use the text in the previous script

Suggested change
help="delete archives within time frame.",
help="Deletes archives that fall within the specified time range.",

Comment on lines 210 to 212
archive_path = archives_dir / archive_id
if not archive_path.is_dir():
logger.warning(f"Archive {archive_id} in database not found on disk.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting this under line 200's for loop?

logger.info("Running in dry-run mode.")

criteria: str
if BY_FILTER_COMMAND == command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested in my previous comment, we should add a guard else case if command is neither two.

Edited: this should no longer be an issue if we use the handler strategy

return 0


def _delete_archives(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for keep brining the same point up. When I read the code, I realized that we have several places where we have this if-else to check the command and that's kind of annoying.

I actually ran into a similar situation a while ago when I implemented another change, and finally came up with a relatively neat solution by creating custom handler class for both cases.

you can find it here. Basically, I created a handler class with method that can be overloaded.

let me know if you would like to simplify the code using this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm okay with how it is right now but obviously I'm not the best authority on what's up to standard. I'm not entirely sure that adding the class would make things simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on yesterday's meeting discussion, let's use the class method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, can you explain it a bit more?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

48-57: Consider adding validation for timestamp parameters.

The FilterDeleteHandler's validate_results method is empty. Consider adding validation to ensure begin_timestamp is less than or equal to end_timestamp.

     def validate_results(self, archive_ids: List[str]) -> None:
-        pass
+        begin_ts, end_ts = self._params
+        if int(begin_ts) > int(end_ts):
+            raise ValueError("begin_timestamp must be less than or equal to end_timestamp")

215-223: Fix typo in docstring parameter name.

The parameter name in the docstring uses "begin_its" instead of "begin_ts".

     Lists all archive IDs, if begin_its and end_its are provided,
-    only lists archives where `begin_ts <= archive.begin_timestamp` and
+    only lists archives where `begin_ts <= archive.begin_timestamp` and
     `archive.end_timestamp <= end_ts`.
     :param archives_dir:
     :param database_config:
     :param begin_ts:
     :param end_ts:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89299ca and f9f4f8e.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (7)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (7)

1-28: LGTM! Well-organized imports and constants.

The imports are logically grouped and the constants follow proper naming conventions.


31-46: LGTM! Well-designed abstract base class.

The DeleteHandler abstract base class provides a clean interface for different deletion strategies.


354-355: LGTM! Proper main guard implementation.

The main guard follows Python best practices.


169-174: 🛠️ Refactor suggestion

Specify exception types instead of using bare except.

Using a bare except clause can mask important exceptions like KeyboardInterrupt.

     try:
         clp_config = load_config_file(config_file_path, default_config_file_path, clp_home)
         clp_config.validate_logs_dir()
-    except:
+    except (FileNotFoundError, ValueError, KeyError) as e:
         logger.exception("Failed to load config.")
         return -1

Likely invalid or redundant comment.


250-252: 🛠️ Refactor suggestion

Add error handling for filesystem operations.

The filesystem check should handle potential permission issues when accessing archive directories.

-                archive_path = archives_dir / archive_id
-                if not archive_path.is_dir():
-                    logger.warning(f"Archive {archive_id} in database not found on disk.")
+                try:
+                    archive_path = archives_dir / archive_id
+                    if not archive_path.is_dir():
+                        logger.warning(f"Archive {archive_id} in database not found on disk.")
+                except PermissionError:
+                    logger.warning(f"Permission denied when checking archive {archive_id} on disk.")

Likely invalid or redundant comment.


59-75: 🛠️ Refactor suggestion

Use parameterized queries to prevent SQL injection.

The current implementation in get_criteria concatenates values directly into the SQL query. While the current usage is safe, it's better to use parameterized queries consistently.

     def get_criteria(self) -> str:
-        return f"id in ({','.join(['%s'] * len(self._params))})"
+        placeholders = ','.join(['%s'] * len(self._params))
+        return f"id in ({placeholders})"

Likely invalid or redundant comment.


309-323: ⚠️ Potential issue

Use parameterized queries for archive_id deletion.

Direct string concatenation of archive_ids in SQL queries is vulnerable to SQL injection.

-            ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids)
+            placeholders = ", ".join(["%s"] * len(archive_ids))
 
             db_cursor.execute(
                 f"""
                 DELETE FROM `{table_prefix}files`
-                WHERE archive_id in ({ids_string})
+                WHERE archive_id in ({placeholders})
-                """
+                """,
+                archive_ids
             )
 
             db_cursor.execute(
                 f"""
                 DELETE FROM `{table_prefix}archive_tags`
-                WHERE archive_id in ({ids_string})
+                WHERE archive_id in ({placeholders})
-                """
+                """,
+                archive_ids
             )

Likely invalid or redundant comment.

Comment on lines +346 to +347
logger.info(f"Deleting archive {archive_id} from disk.")
shutil.rmtree(archive_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for filesystem operations.

The rmtree operation should handle potential filesystem errors.

         logger.info(f"Deleting archive {archive_id} from disk.")
-        shutil.rmtree(archive_path)
+        try:
+            shutil.rmtree(archive_path)
+        except (PermissionError, OSError) as e:
+            logger.error(f"Failed to delete archive {archive_id} from disk: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info(f"Deleting archive {archive_id} from disk.")
shutil.rmtree(archive_path)
logger.info(f"Deleting archive {archive_id} from disk.")
try:
shutil.rmtree(archive_path)
except (PermissionError, OSError) as e:
logger.error(f"Failed to delete archive {archive_id} from disk: {e}")

@Eden-D-Zhang Eden-D-Zhang requested a review from haiqi96 January 23, 2025 18:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

55-57: Document the empty validate_results method.

Add a docstring explaining why this method is intentionally empty.

     def validate_results(self, archive_ids: List[str]) -> None:
+        """
+        No validation needed for filter-based deletion as all matching archives are valid targets.
+        """
         pass

215-223: Complete the docstring parameters.

The docstring is missing parameter descriptions and return value documentation.

     """
     Lists all archive IDs, if begin_ts and end_ts are provided,
     only lists archives where `begin_ts <= archive.begin_timestamp` and
     `archive.end_timestamp <= end_ts`.
-    :param archives_dir:
-    :param database_config:
-    :param begin_ts:
-    :param end_ts:
+    :param archives_dir: Path to the archives directory
+    :param database_config: Database configuration object
+    :param begin_ts: Start timestamp in milliseconds since epoch
+    :param end_ts: Optional end timestamp in milliseconds since epoch
+    :return: 0 on success, -1 on failure
     """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9f4f8e and 63775f5.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build (macos-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (6)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (6)

1-28: Well-organized imports and constants!

The code demonstrates good practices with properly organized imports and well-defined constants for commands and arguments.


354-355: LGTM!

Proper implementation of the Python main guard.


169-174: ⚠️ Potential issue

Specify exception types instead of using bare except.

Using a bare except clause can mask important exceptions like KeyboardInterrupt.

     try:
         clp_config = load_config_file(config_file_path, default_config_file_path, clp_home)
         clp_config.validate_logs_dir()
-    except:
+    except (FileNotFoundError, ValueError, KeyError) as e:
         logger.exception("Failed to load config.")
         return -1

Likely invalid or redundant comment.


250-252: ⚠️ Potential issue

Add error handling for filesystem operations.

The filesystem check should handle potential permission issues.

-                archive_path = archives_dir / archive_id
-                if not archive_path.is_dir():
-                    logger.warning(f"Archive {archive_id} in database not found on disk.")
+                try:
+                    archive_path = archives_dir / archive_id
+                    if not archive_path.is_dir():
+                        logger.warning(f"Archive {archive_id} in database not found on disk.")
+                except PermissionError:
+                    logger.warning(f"Permission denied when checking archive {archive_id} on disk.")

Likely invalid or redundant comment.


346-347: ⚠️ Potential issue

Add error handling for filesystem operations.

The rmtree operation should handle potential filesystem errors.

         logger.info(f"Deleting archive {archive_id} from disk.")
-        shutil.rmtree(archive_path)
+        try:
+            shutil.rmtree(archive_path)
+        except (PermissionError, OSError) as e:
+            logger.error(f"Failed to delete archive {archive_id} from disk: {e}")

Likely invalid or redundant comment.


309-323: ⚠️ Potential issue

Use parameterized queries to prevent SQL injection.

The current implementation concatenates archive IDs directly into the SQL query.

-            ids_string = ", ".join(f"'{archive_id}'" for archive_id in archive_ids)
+            placeholders = ", ".join(["%s"] * len(archive_ids))
 
             db_cursor.execute(
                 f"""
                 DELETE FROM `{table_prefix}files`
-                WHERE archive_id in ({ids_string})
+                WHERE archive_id in ({placeholders})
-                """
+                """,
+                archive_ids,
             )
 
             db_cursor.execute(
                 f"""
                 DELETE FROM `{table_prefix}archive_tags`
-                WHERE archive_id in ({ids_string})
+                WHERE archive_id in ({placeholders})
-                """
+                """,
+                archive_ids,
             )

Likely invalid or redundant comment.


class IdDeleteHandler(DeleteHandler):
def get_criteria(self) -> str:
return f"id in ({','.join(['%s'] * len(self._params))})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use parameterized query placeholders to prevent SQL injection.

The string formatting in get_criteria() could be vulnerable to SQL injection if the params contain malicious input.

-        return f"id in ({','.join(['%s'] * len(self._params))})"
+        placeholders = ','.join(['%s'] * len(self._params))
+        return f"id in ({placeholders})"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return f"id in ({','.join(['%s'] * len(self._params))})"
placeholders = ','.join(['%s'] * len(self._params))
return f"id in ({placeholders})"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants