Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cloudpathlib S3Path for product downloads #102

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

laxmibalami
Copy link
Contributor

The feature to download multiple products at once with download_products is awesome. However since it only supports local paths currently, it limits its usage and adoption in the team. Hence, it would be great if we have a native support here for S3Path to stream data directly to S3 and avoid disk writes and uploads.
Open to all suggestion/feedback to iterate on the proposed changes included in this PR.

@coveralls
Copy link

coveralls commented Jan 28, 2025

Coverage Status

coverage: 95.527% (-0.5%) from 95.983%
when pulling 9457dac on laxmibalami:support-s3path-for-product-downloads
into a503a96 on capellaspace:main.

pyproject.toml Outdated
@@ -36,6 +36,8 @@ httpx = "^0.27.0"
retrying = "^1.3.3"
rich = "^13.7.1"
geojson = "^3.0.1"
botocore = "^1.35.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also declare this as an optional dependency ( {version = "^1.36.11", extras = ["s3"]})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated :) Both are now optional dependencies.

@@ -27,7 +28,7 @@
@dataclass
class DownloadRequest:
url: str
local_path: Path
local_path: Union[Path, S3Path]
Copy link
Collaborator

@grmpflh27 grmpflh27 Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach but prefer to make cloudpathlib[s3] (or any other future storage destination) a true opt in (as declared in the dependencies) in order to not bloat the package.

The current implementation would unfortunately 💥 for that case (pip install capella-console-client).

Probably abstracting that (cloudpathlib.S3Path) into a separate module (s3.py) with some safety wiring on ImportError at Runtime (not import time) would do the trick.

something like

# s3.py
try:
    from cloudpathlib import S3Path
except ImportError:
    # raise an error when accessed
    class S3Path:
        def __init__(self, *args, **kwargs):
            raise ImportError(
                ""S3Path requires the 'cloudpathlib' package. Install it with 'pip install capella-console-client[s3]'.""
            )
            
# e.g. assets.py
from s3 import S3Path
...

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes totally agree and I like the idea. Have updated it with the suggested changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great implementation!

Copy link
Collaborator

@grmpflh27 grmpflh27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommendation to author a unit test covering client.download_products(local_dir="s3://x/y/", ...) with some mocks in place.

feel free to go ahead either way

@@ -27,7 +28,7 @@
@dataclass
class DownloadRequest:
url: str
local_path: Path
local_path: Union[Path, S3Path]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great implementation!

@@ -532,7 +533,7 @@ def download_products(
order_id: Optional[str] = None,
tasking_request_id: Optional[str] = None,
collect_id: Optional[str] = None,
local_dir: Union[Path, str] = Path(tempfile.gettempdir()),
local_dir: Union[Path, S3Path, str] = Path(tempfile.gettempdir()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate you making this backwards compatible.

I'd create a follow up PR introducing a new option alongside local_dir (e.g. location) that captures non-local (s3) and marking local_dir as to be deprecated for backwards compatibility - you can also do this on this PR if you choose to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants