Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Option to Skip or Prioritize Objects During Backup #1047

Open
DroneArd opened this issue Jan 8, 2025 · 1 comment
Open

Add Option to Skip or Prioritize Objects During Backup #1047

DroneArd opened this issue Jan 8, 2025 · 1 comment

Comments

@DroneArd
Copy link

DroneArd commented Jan 8, 2025

Summary

Introduce functionality to define a starting point for processing objects, enabling the script to either skip a specified number of objects or prioritize processing from the oldest or newest items. This would significantly enhance efficiency for scenarios with large datasets.

Context

I manage a dataset of over 200,000 images and videos totaling approximately 1.5 TB. While attempting to back up this data, the process often stalls on large video files. Restarting the script resolves the issue partially, as successfully backed-up objects are skipped. However, verifying and re-checking the first 120,000 objects consumes significant time (several hours), leaving the backup stuck at around 80% completion.

To address this:

  1. Current workaround: Skipping videos temporarily, which allowed smoother backups of smaller files.
  2. Goal: Resume the backup for videos only, but efficiently bypass previously processed objects to save time.

Adding an option to:
• Skip a specified number of objects (e.g., 120,000).
• Start processing from the oldest or newest objects based on timestamps or order.

Proposed Features

  1. Skip Objects by Count
    • Allow users to input a number (--skip N) to bypass the first N objects before starting backup operations.
  2. Process in Reverse Order
    • An option to start from the oldest items instead of the newest (--start-oldest). This would prioritize objects based on creation date or a similar attribute.
  3. Combination of Skipping and Order
    • Enable combining options, such as skipping N objects and then starting from either the oldest or newest remaining items.

Potential Implementation

•	Add arguments to the script:
•	--skip <number>: Skip the first <number> objects before processing.
•	--start-oldest: Process objects starting from the oldest.
•	--start-newest: Default behavior (if not specified).
•	Use object metadata (e.g., timestamps) to define order when using --start-oldest.
•	Ensure these options work independently or in combination.
@AndreyNikiforov
Copy link
Collaborator

Finding root cause of the stalls and fixing them would be my preferred way of resolving your issue. If you see it consistently, you may have a rare opportunity to finding the root cause.

Also, search issues and discussions - I remember seeing reports on stalls for large files. Hypothesis is that tokens expire

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants