-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support cloudpathlib S3Path for product downloads #102
base: main
Are you sure you want to change the base?
Support cloudpathlib S3Path for product downloads #102
Conversation
pyproject.toml
Outdated
@@ -36,6 +36,8 @@ httpx = "^0.27.0" | |||
retrying = "^1.3.3" | |||
rich = "^13.7.1" | |||
geojson = "^3.0.1" | |||
botocore = "^1.35.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also declare this as an optional dependency ( {version = "^1.36.11", extras = ["s3"]}
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated :) Both are now optional dependencies.
@@ -27,7 +28,7 @@ | |||
@dataclass | |||
class DownloadRequest: | |||
url: str | |||
local_path: Path | |||
local_path: Union[Path, S3Path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach but prefer to make cloudpathlib[s3]
(or any other future storage destination) a true opt in (as declared in the dependencies) in order to not bloat the package.
The current implementation would unfortunately 💥 for that case (pip install capella-console-client
).
Probably abstracting that (cloudpathlib.S3Path
) into a separate module (s3.py
) with some safety wiring on ImportError
at Runtime (not import time) would do the trick.
something like
# s3.py
try:
from cloudpathlib import S3Path
except ImportError:
# raise an error when accessed
class S3Path:
def __init__(self, *args, **kwargs):
raise ImportError(
""S3Path requires the 'cloudpathlib' package. Install it with 'pip install capella-console-client[s3]'.""
)
# e.g. assets.py
from s3 import S3Path
...
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes totally agree and I like the idea. Have updated it with the suggested changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great implementation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommendation to author a unit test covering client.download_products(local_dir="s3://x/y/", ...)
with some mocks in place.
feel free to go ahead either way
@@ -27,7 +28,7 @@ | |||
@dataclass | |||
class DownloadRequest: | |||
url: str | |||
local_path: Path | |||
local_path: Union[Path, S3Path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great implementation!
@@ -532,7 +533,7 @@ def download_products( | |||
order_id: Optional[str] = None, | |||
tasking_request_id: Optional[str] = None, | |||
collect_id: Optional[str] = None, | |||
local_dir: Union[Path, str] = Path(tempfile.gettempdir()), | |||
local_dir: Union[Path, S3Path, str] = Path(tempfile.gettempdir()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate you making this backwards compatible.
I'd create a follow up PR introducing a new option alongside local_dir
(e.g. location
) that captures non-local (s3) and marking local_dir
as to be deprecated for backwards compatibility - you can also do this on this PR if you choose to.
The feature to download multiple products at once with
download_products
is awesome. However since it only supports local paths currently, it limits its usage and adoption in the team. Hence, it would be great if we have a native support here for S3Path to stream data directly to S3 and avoid disk writes and uploads.Open to all suggestion/feedback to iterate on the proposed changes included in this PR.