-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transaction support to write
.
#72
Conversation
Co-authored-by: Max Freudenberg <[email protected]>
Also note that I used GDALDatasetStartTransaction on the dataset level (https://gdal.org/api/raster_c_api.html#gdal_8h_1a57e354633ef531d521b674b4e5321369) instead of the OGR_L_StartTransaction at the layer level, as recommended by GDAL. |
I'm fine with that :) Overall, this seems to be a solid improvement. When I wrote the initial code, I just copied the older stuff from ArchGDAL, with all its flaws. The main problem for me back then was lacking documentation and the cumbersome API of (Arch)GDAL, so the outcome was sub-optimal.
Very good, I think these changes must be upstreamed to ArchGDAL.
Yeah, I read the documentation just yesterday 😅
I'll benchmark it and let you know. The only nitpick: If we want really solid code, then we should also include transaction rollback. |
I tested both implementations with a dataframe containing ~38k polygons with ids and took several timings. I'm surprised, but your code is consistently slower than the old version and also allocates more: # old
julia> @time GeoDataFrames.write("/tmp/deleteme.sqlite", df; use_gdal_copy=false)
0.286841 seconds (655.66 k allocations: 13.266 MiB)
"/tmp/deleteme.sqlite"
# new
julia> @time GeoDataFrames.write("/tmp/deleteme.sqlite", df)
0.439611 seconds (3.39 M allocations: 99.919 MiB, 3.43% gc time)
"/tmp/deleteme.sqlite" |
Always good to profile. So I forgot to shortcut the old: |
I can confirm the improvement; for me it went down from 0.44s to 0.2s, which is about 1.8 times faster than my code and 18 times faster than master. Looking forward to the merge :) |
Whoop whoop 🎉 Thanks! :) |
Supersedes #69, but made @maxfreu a co-author on this PR.
Apart from formatting, the relevant bits are here:
feat/use-transactions?expand=1
#diff-db104fa7acFrom inspection of the GDAL code, and the code in #69, it seems copy is not necessary at all. Direct creation of a layer is possible, but only using
addfeature
(adding it), ascreatefeature
actually tries to overwrite an existing one (and can't find one) when the layer is linked to a dataset. 🤷🏻Also adds use of
ogr_f_setgeomfielddirectly
, which prevents making a copy of the geometry. Overall this code should be faster and use less memory. @maxfreu Can you confirm a 30x speed improvement with this code as well?