-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Azure Blob Storage SDK to v12 #2573
base: master
Are you sure you want to change the base?
Conversation
There's an admin test that is failing in CI that isn't failing locally, so I'm going to put some debugging in in the next commits to try to figure out what's happening. |
FYI, it is failing for me locally too 👀 Not that it helps much ... but I can possibly investigate a bit myself and see if I find anything. |
If it helps:
|
Thanks! Might be a statefulness thing, I'll start a fresh env. |
...Wouldn't it be better to include actual information in that The check would work the same, and the error could be logged properly before redirect (...as it probably should be, anyway.) |
@samuelhwilliams It was an error due to the tests using an older Azure storage emulator, I've updated them to the official Microsoft hosted emulator now, and they're passing fine. |
@@ -0,0 +1,14 @@ | |||
// For format details, see https://aka.ms/devcontainer.json. | |||
{ | |||
"name": "flask-admin (Python + Azurite)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, I could add Postgres and Mongo to this dev container too, to have a single container that can run all the tests. It should be fairly easy given tests.yaml has the services setup, just copying that to docker-compose.yaml.
Hm, it appears I still don't have an approach for handling unimportable Azure modules that makes mypy happy. Let me know if you have any thoughts about the best practice for importing extra modules in tests (and skipping tests if they don't exist). I'll take another look Monday otherwise. |
I don't think we need to support tests running without azure installed - I'd probably be fine with you removing the try/except around that test import. |
Okay, I've made it so that the tests assume you've got azure-blob-storage installed. I've also made the tests devcontainer bring in Postgres and Mongo too, so I was able to get all the tests passing in my dev container environment, without additional setup. |
self._container_name = container_name | ||
self._connection_string = connection_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the changes I made on the s3 admin side when bringing it up to date was to have __init__
take a client instance rather than parameters that get passed the client.
Do you think we should do something similar here and accept an instance of BlobServiceClient
, or is it still fine to just use the connection string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think thats nice, as I personally don't typically use connection strings (this was my first time using from_connection_string), so that gives developers more flexibility as to how they connect. I can make that change. That'd be breaking, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be, but this is scheduled to go out for the v2 release where we're making a bunch of breaking changes, so I'm ok with it.
If you'd be happy to, feel free :) 🙏
FYI: there are no type annotations for AzureFileAdmin and S3FileAdmin currently. When we add them in, we get a mypy error about "multiple values for keyword storage". I think it's confused by the args/kwargs around it. So I have left off type annotations on AzureFileAdmin for now, though I added them to the storage class in a few places. |
Just testing and still seeing errors on download that you said are related to Azure/Azurite#656, and also managed to upload a JPG and get an error from azure when trying to rename it. 🤔 If this is still WIP can we flip it to draft, as I'm not sure whether you want me to re-review it yet 👀 (or can you @ me when it's ready so that I know I need to look 🙏) |
Are you getting those with the local Azure connection or prod? I thought I
tested that rename flow for both local/prod but I can go through it again
and add a pytest so that CI should catch these issues.
…On Sun, Dec 1, 2024 at 8:37 AM Samuel Williams ***@***.***> wrote:
Just testing and still seeing errors on download that you said are related
to Azure/Azurite#656 <Azure/Azurite#656>, and
also managed to upload a JPG and get an error from azure when trying to
rename it. 🤔
If this is still WIP can we flip it to draft, as I'm not sure whether you
want me to re-review it yet 👀
—
Reply to this email directly, view it on GitHub
<#2573 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACIQUT4WYU2P2ZFZPEDRJT2DM3MTAVCNFSM6AAAAABSH4QMS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBZHE3DSNRXGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Using (I don't have a real azure account so might struggle to test there - let me know if I really need to get one 👀) |
Hm, I haven't been able to replicate yet, on this branch in GitHub Codespaces with the dev container configuration. One thing I could do early next week is to ask my colleagues to do a bug bash on this branch, with both local and prod, to see if they encounter any issues (since we all have Azure accounts). That'd be a fun excuse to intro them to flask-admin anyway. Here's my successful rename: And my logs - which aren't super useful except that they show which URL is being requested. From what I've read, Azurite "start_copy_from_url" should work as long as the server is the same (http://127.0.0.1:10000 in this case).
|
I'm bug bash'ing this branch with some colleagues tomorrow, so hopefully we'll replicate that issue. |
if not blob.properties or not blob.properties.has_key("content_settings"): | ||
raise ValueError("Blob has no properties") | ||
mime_type = blob.properties["content_settings"]["content_type"] | ||
blob_file = io.BytesIO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the blob_file
should be closed here. It will delete the buffer. flask.send_file
doesn't seem to close it. It will be closed when the variable is GC'ed but if you're getting lots of files it could consume a lot of memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, according to comments from David Lord of Flask issues, the file does eventually get closed: pallets/flask#2468
I also did try the context manager approach, but immediately got an error from werkzeug being unable to access a closed file. So I've left this as is.
We just did a little bug bash. Possible issues:
Issues that are probably Codespace-related and shouldn't happen outside of Codespaces:
Issues outside of this PR's scope:
I'll move the PR to WIP while I triage those issues. We did not replicate the rename-file issue with Azurite, all attendees were able to rename. |
I tried a larger file, no luck, but can you attach your large file and also tell me what you tried renaming it to? In theory, I should be able to replicate, given we're in a containerized environment. If I still fail, then I'll DM next week on Discord and see if we can hop on a screen share or some thing. |
Following up on bug bash issues:
|
I am still working through an issue with "rename" when connecting to a prod account using the connection string (versus using OAuth token-based connection). |
Unfortunately all of the files I've ended up reproducing it with are personal ones that I'd rather not share (eg a passport scan) 😂 I'm willing to accept that this might be something about my setup... |
@samuelhwilliams I worked a bunch on this today, with help from an Azure Blob SDK engineer, and ended up changing the rename method so that it uses an asynchronous copy behind the scenes (versus synchronous). Now the code works for me for:
I tested with files of up to a size of 5 GB on my prod Azure account, and was able to rename them all. I do not currently have any other planned changes, so I've marked this as ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pamelafox really nice work - sorry it's taken a bit of time to get around to testing this out locally.
When I upload files in a sub-directory, then go back to the root, the directory shows up multiple times (seemingly once for itself and once for each item in the directory). This is with local azurite:
Just spent a bit of time trying to test with a real azure account but I'm getting errors currently when trying to sign up for a storage account, so that's great 🙃
This feels close though - only spotted that one issue, really. A few other comments but nothing major.
**NOTE!** For all the tests to pass successfully, you\'ll need Postgres (with | ||
the postgis and hstore extension) & MongoDB to be running locally. You'll | ||
also need *libgeos* available. | ||
**NOTE!** For all the tests to pass successfully, you\'ll need several services running locally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
**NOTE!** For all the tests to pass successfully, you\'ll need several services running locally: | |
**NOTE!** For all the tests to pass successfully, you'll need several services running locally: |
if __name__ == "__main__": | ||
app.run(debug=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for adding the example - useful 👍
src_blob_client = self._container_client.get_blob_client(src) | ||
dst_blob_client = self._container_client.get_blob_client(dst) | ||
copy_result = dst_blob_client.start_copy_from_url(src_blob_client.url) | ||
if copy_result.get("copy_status") == "success": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it well known in what conditions blob storage will decide to do this synchronously vs asynchronously? Is it just based on file size, or some other information?
I'm fairly uncomfortable with the sleep(10) below in the async journey. It was previously sleep(1), which is better, but I guess I'd prefer to avoid sleeping in a request at all.
I suppose if we're keeping the rename operation as a copy+delete we probably can't just move on immediately if the copy hasn't finished, so maybe this is an OK workaround if async is only going to happen for really large files. It mightstill be nice to poll for updates more frequently than every 10 seconds though?
Fixes #2566
This PR upgrades the azure-storage-blob SDK from v2 to v12, which involved a lot of interface changes. I followed the migration guide @ https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/storage/azure-storage-blob/migration_guide.md and was able to get all the previous functionality working, at least according to the tests.
For ease of testing, I added a simple example app and a devcontainer.azure.json which brings in the Azurite local emulator. That means you can open this repo inside a Codespace or Dev Container with that configuration, and Azure Blob Storage will be running for you.