Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transaction support to write. #72

Merged
merged 4 commits into from
Jun 21, 2024
Merged

Add transaction support to write. #72

merged 4 commits into from
Jun 21, 2024

Conversation

evetion
Copy link
Owner

@evetion evetion commented Jun 20, 2024

Supersedes #69, but made @maxfreu a co-author on this PR.

Apart from formatting, the relevant bits are here: feat/use-transactions?expand=1#diff-db104fa7ac

From inspection of the GDAL code, and the code in #69, it seems copy is not necessary at all. Direct creation of a layer is possible, but only using addfeature (adding it), as createfeature actually tries to overwrite an existing one (and can't find one) when the layer is linked to a dataset. 🤷🏻

Also adds use of ogr_f_setgeomfielddirectly, which prevents making a copy of the geometry. Overall this code should be faster and use less memory. @maxfreu Can you confirm a 30x speed improvement with this code as well?

Co-authored-by: Max Freudenberg <[email protected]>
@evetion
Copy link
Owner Author

evetion commented Jun 20, 2024

Also note that I used GDALDatasetStartTransaction on the dataset level (https://gdal.org/api/raster_c_api.html#gdal_8h_1a57e354633ef531d521b674b4e5321369) instead of the OGR_L_StartTransaction at the layer level, as recommended by GDAL.

@maxfreu
Copy link
Contributor

maxfreu commented Jun 21, 2024

Supersedes #69, but made @maxfreu a co-author on this PR.

I'm fine with that :)

Overall, this seems to be a solid improvement. When I wrote the initial code, I just copied the older stuff from ArchGDAL, with all its flaws. The main problem for me back then was lacking documentation and the cumbersome API of (Arch)GDAL, so the outcome was sub-optimal.

Also adds use of ogr_f_setgeomfielddirectly, which prevents making a copy of the geometry. Overall this code should be faster and use less memory.

Very good, I think these changes must be upstreamed to ArchGDAL.

I used GDALDatasetStartTransaction on the dataset level

Yeah, I read the documentation just yesterday 😅

Can you confirm a 30x speed improvement with this code as well?

I'll benchmark it and let you know.

The only nitpick: If we want really solid code, then we should also include transaction rollback.

@maxfreu
Copy link
Contributor

maxfreu commented Jun 21, 2024

I tested both implementations with a dataframe containing ~38k polygons with ids and took several timings. I'm surprised, but your code is consistently slower than the old version and also allocates more:

# old
julia> @time GeoDataFrames.write("/tmp/deleteme.sqlite", df; use_gdal_copy=false)
  0.286841 seconds (655.66 k allocations: 13.266 MiB)
"/tmp/deleteme.sqlite"

# new
julia> @time GeoDataFrames.write("/tmp/deleteme.sqlite", df)
  0.439611 seconds (3.39 M allocations: 99.919 MiB, 3.43% gc time)
"/tmp/deleteme.sqlite"

@evetion
Copy link
Owner Author

evetion commented Jun 21, 2024

I tested both implementations with a dataframe containing ~38k polygons with ids and took several timings. I'm surprised, but your code is consistently slower than the old version and also allocates more:

Always good to profile. So I forgot to shortcut the _convert method, otherwise we always convert ArchGDAL geometries. Now it requires just one unsafe_copy. I now have less allocations, and twice as fast code for writing a 7k polygon geopackage.

old:
0.152208 seconds (271.71 k allocations: 4.585 MiB)
this PR:
0.062026 seconds (174.90 k allocations: 3.108 MiB)

@maxfreu
Copy link
Contributor

maxfreu commented Jun 21, 2024

I can confirm the improvement; for me it went down from 0.44s to 0.2s, which is about 1.8 times faster than my code and 18 times faster than master. Looking forward to the merge :)

@evetion evetion merged commit 72c5e7e into master Jun 21, 2024
1 of 10 checks passed
@evetion evetion deleted the feat/use-transactions branch June 21, 2024 19:17
@maxfreu
Copy link
Contributor

maxfreu commented Jun 23, 2024

Whoop whoop 🎉 Thanks! :)

@evetion evetion mentioned this pull request Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants