Welcome to AnyPathLib, a Python library designed to allow hassle-free file operations across different cloud and local storage
With AnyPathLib
you can write the same code to handle files across different storage systems, without worrying about
the
underlying details.
Operations can be optimized per-backend and the library is easily extendable to support additional cloud storage
providers.
from anypathlib import AnyPath
# Create an AnyPath instance for a local file
local_file = AnyPath("/path/to/local/file.txt")
# Create an AnyPath instance for an S3 object
s3_file = AnyPath("s3://bucket/path/to/object.txt")
# Copy a file from local to S3
local_file.copy(s3_file)
# Copy a directory from S3 to Azure
s3_dir = AnyPath("s3://bucket/path/to/dir")
azure_dir = AnyPath("https://account_name.blob.core.windows.net/container_name/path")
s3_dir.copy(azure_dir)
Use "copy" without a target to get a local copy of the file which is stored in a local cache.
Use force_overwrite=False
to prevent repeated downloads of the same file
my_dir = AnyPath("https://account_name.blob.core.windows.net/container_name/path/to/dir")
local_dir_path = my_dir.copy()
my_file = AnyPath("s3://bucket/path/to/file.txt")
local_file_path = my_file.copy()
local_file_path = my_file.copy(force_overwrite=False) # Returns the path of the previously downloaded file
my_dir = AnyPath("https://account_name.blob.core.windows.net/container_name/path/to/dir")
my_dir.exists() # True if my_path exists, otherwise False
parent, name, stem = my_dir.parent, my_dir.name, my_dir.stem
files_in_dir: List[AnyPath] = my_dir.rglob('*') # List of AnyPath instances for files in the directory
my_file = AnyPath("s3://bucket/path/to/file.txt")
my_file.is_file() # True if my_path exists, otherwise False
my_file.is_dir() # False
my_file.remove()
AnyPathLib
also comes with a CLI tool that allows you to perform file operations from the command line.
You can run anypathlib --help
to get a list of available commands and options.
Here are some examples:
Copy:
anypathlib copy -i /path/to/source -o /path/to/destination
Remove a file or directory:
anypathlib remove -p /path/to/file_or_directory
- Unified, Cloud Agnostic, API: Perform file operations across different storage backends using the same set of methods.
- Path-like Operations: Supports common path operations like joining paths, listing directories, checking file existence, etc.
- Performance: Local caching for repeated downloads across different sessions, multithreading, and more.
- Extensibility: Easily extendable to support additional cloud storage providers.
AnyPath
does not store any credentials in it. In order to access cloud storage, you need to have the necessary
environment variables defined.
export AZURE_SUBSCRIPTION_ID="your-subscription-id"
export AZURE_RESOURCE_GROUP_NAME="your-resource-group-name"
Same as Boto3:
export AWS_DEFAULT_REGION="your-region"
export AWS_SECRET_ACCESS_KEY="your-secret"
export AWS_ACCESS_KEY_ID="your-key"
- Add support for additional cloud storage providers.
GCP
- Improve API
Add open method for reading files, etc.
- Implement cloud-to-cloud ops more efficiently.
cache azure credentials to avoid repeated logins
- Improve logging and add verbose mode.
progress bar, etc.
Thanks goes to these wonderful people:
Yuval Shomer 🎨 🤔 |
Jeremy Levy 🎨 🤔 |