You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduce functionality to define a starting point for processing objects, enabling the script to either skip a specified number of objects or prioritize processing from the oldest or newest items. This would significantly enhance efficiency for scenarios with large datasets.
Context
I manage a dataset of over 200,000 images and videos totaling approximately 1.5 TB. While attempting to back up this data, the process often stalls on large video files. Restarting the script resolves the issue partially, as successfully backed-up objects are skipped. However, verifying and re-checking the first 120,000 objects consumes significant time (several hours), leaving the backup stuck at around 80% completion.
To address this:
Current workaround: Skipping videos temporarily, which allowed smoother backups of smaller files.
Goal: Resume the backup for videos only, but efficiently bypass previously processed objects to save time.
Adding an option to:
• Skip a specified number of objects (e.g., 120,000).
• Start processing from the oldest or newest objects based on timestamps or order.
Proposed Features
Skip Objects by Count
• Allow users to input a number (--skip N) to bypass the first N objects before starting backup operations.
Process in Reverse Order
• An option to start from the oldest items instead of the newest (--start-oldest). This would prioritize objects based on creation date or a similar attribute.
Combination of Skipping and Order
• Enable combining options, such as skipping N objects and then starting from either the oldest or newest remaining items.
Potential Implementation
• Add arguments to the script:
• --skip <number>: Skip the first <number> objects before processing.
• --start-oldest: Process objects starting from the oldest.
• --start-newest: Default behavior (if not specified).
• Use object metadata (e.g., timestamps) to define order when using --start-oldest.
• Ensure these options work independently or in combination.
The text was updated successfully, but these errors were encountered:
Finding root cause of the stalls and fixing them would be my preferred way of resolving your issue. If you see it consistently, you may have a rare opportunity to finding the root cause.
Also, search issues and discussions - I remember seeing reports on stalls for large files. Hypothesis is that tokens expire
Summary
Introduce functionality to define a starting point for processing objects, enabling the script to either skip a specified number of objects or prioritize processing from the oldest or newest items. This would significantly enhance efficiency for scenarios with large datasets.
Context
I manage a dataset of over 200,000 images and videos totaling approximately 1.5 TB. While attempting to back up this data, the process often stalls on large video files. Restarting the script resolves the issue partially, as successfully backed-up objects are skipped. However, verifying and re-checking the first 120,000 objects consumes significant time (several hours), leaving the backup stuck at around 80% completion.
To address this:
Adding an option to:
• Skip a specified number of objects (e.g., 120,000).
• Start processing from the oldest or newest objects based on timestamps or order.
Proposed Features
• Allow users to input a number (--skip N) to bypass the first N objects before starting backup operations.
• An option to start from the oldest items instead of the newest (--start-oldest). This would prioritize objects based on creation date or a similar attribute.
• Enable combining options, such as skipping N objects and then starting from either the oldest or newest remaining items.
Potential Implementation
The text was updated successfully, but these errors were encountered: