Fix simplifies #44

skygering · 2024-01-04T21:58:28Z

The simplification function wasn't working for DouglasPeucker with the tol keyword used. I found that this was because although the algorithm was using squared distance, it wasn't squaring the tolerance within the algorithm so no points were being removed.

I decided to move the adaptation of the tolerance values into the constructors so that the same SimplifyAlg object could be used more than once without the tolerance being changed each time that the function was run. This seems important since for the algorithms other than DouglasPeucker you do need to explicitly use the object. I also swapped to using @kwdef (is it exported yet?) to get rid of the constructor that does exactly what @kwdef does.

While I was doing this, I noticed that the DouglasPeucker algorithm was wrong. It starts at the "bottom" and finds the smallest tolerances between adjacent edges and then removes the smallest and moves "up" the shape. The actual DouglasPeucker algorithm finds edges with the largest distances and marks those to be kept and then moves "down" the shape. You can see a gif of this here. This gives different, but similar results since in DouglasPeucker there are some points that can't be removed that can be removed with the previous implementation. I implemented it using recursion, but I am not sure how to make it work for the number and ratio keywords. I will think about it, but I would love any suggestions.

Here is the before performance in LibGEOS and with GeometryOps (with LG and GI polygons of 35 points reduced to 17 points) for DouglasPeucker:

And here is the new performance (I think it is faster because we can stop earlier):

I also added a file of really wacky polygons that I use in my work for future testing.

skygering · 2024-01-04T21:59:33Z

We could also get this in so that at least it works with tol and then open up another PR to fix the rest. I am also not convinced by the implementation of the radial distance one either.

rafaqz · 2024-01-04T23:10:06Z

Good catch. I mostly adapted some elses implementation for that 😅

Would be good to keep the number and ratio. Its pretty slick specifying those. You can do it iteratively like a root finder...

(to clarify we can merge this and add it back later)

rafaqz

Thanks for fixing my horrible errors. It would be good to find some known correct algs to test against

rafaqz · 2024-01-05T09:37:31Z

src/transformations/simplify.jl

-end
+#= For each algorithm, apply simplication to all curves, multipoints, and
+points, reconstructing everything else around them. =#
+_simplify(alg::SimplifyAlg, data; kw...) =


Is this function layout from a style guide? I normally write roughly Blue Style. Can totally change that but it would be good to agree on one, I would habitually change it back to the above!

No, I just use equals when it is just one function equal to another. But I see in the BlueSky style guide (new to me!) that that isn't preferred unless it fits on one line.

I guess my logic is, is we are literally just renaming the function, just like in lines 101 and 103 and those were written the exact same way.

Yeah style guides are never quite what you want but it can simplify some things in collaborations like this. There's a more common standard julia stule guide somewhere too Im easy either way.

This is not so concise:
https://docs.julialang.org/en/v1/manual/style-guide/

But I think its encoded in JuliaFormatter more precisely as the default:
https://domluna.github.io/JuliaFormatter.jl/stable/style/

rafaqz · 2024-01-05T09:56:47Z

src/transformations/simplify.jl

    end
    return result
 end

 _remove!(s, i) = s[i:end-1] .= s[i+1:end]
+


Best not to mix changes and moves. We should follow ColPrac, this is a guidline:

https://github.com/SciML/ColPrac

So in the future would you prefer that this be a new PR?

Thanks that would be nice! But I also posted ColPrac to distance this request from being just a personal preference. See:

PRs that move code should not also change code, so that they are easier to review. If only moving code, review for correctness is not required. If only changing code, then the diff makes it clear what lines have changed. PRs with large improvements to style should not also change functionality. This is to avoid making large diffs that are not the focus of the PR. While it is often helpful to fix a few typos in comments on the way past, it is different from using a regex or formatter on the whole project to fix spacing around operators.

Following ColPrac (roughly) also makes it easier to point out my own bad coding practices

src/transformations/simplify.jl

rafaqz · 2024-01-05T10:11:52Z

src/transformations/simplify.jl

        previous = point
    end
    ## Never remove the end points
    distances[begin] = distances[end] = Inf
-    ## This avoids taking the square root of each distance above


The reason for doing this here is that radial distance can be used to prefilter for the other algorithms. So squaring before makes it unclear if its squared already, or multiplied by 2 in VisvilinghamWyatt.

(basically the prefiltering was half finished code)

The problem with this for me is that if you make an object

alg = RadialDistance(; tol = 1.5) and you have two polygons poly1 and poly2, I would expect to be able to do the following:

simplify(alg, poly1)
simplify(alg, poly2)

But in this case, the tolerance would be squared twice, right? I would rather have a converter to take a DouglasPeucker or a VisvalingamWhyatt algorithm to a RadialDistance algorithm and it know to either square it or halve if or whatever and then pass that to use the pre-filtering.

Actually, it might make sense for them to just provide a pre-filtering algorithm if they so desire with an optional prefilter_alg argument. This would increase the flexibility and also would be more consistent than using tol since the tol for VisvalingamWhyatt is an area while the tol for both of the other two is a distance which would cause problems if you try to prefilter VisvalingamWhyatt using RadialDistance with the same alg object. This would then allow a different tolerance/number/ratio for the pre-filtering than the actual simplification algorithm and would let people hook into the architecture easier. Thoughts?

Yeah thats a better idea.

I think its worth having if you have huge detailed polygons, as RadialDistance should be so much faster.

skygering · 2024-01-09T22:23:26Z

I added in the pre-filter algorithm and I have also re-written the DouglasPeucker algorithm iteratively instead of recursively so that it can also be used with the number and ratio keywords. This led to a pretty nice speedup from above:

I would like to add a few more tests, but then I think this is good to go.

How do you feel about the new pre-filtering algorithm? My only concern is that users might be confused by it. Like why not just run _simplify with one algorithm first and then another by calling _simplify again? And it is to save time so we don't have to re-break down the polygon. But, it is still a valid question of if that benefit is worth it being an additional keyword. Probably for really big shapes. So I am happy with it if you are @rafaqz.

rafaqz · 2024-01-09T22:34:57Z

Maybe we should try it on some huge polygons and see if its worth it, or if just running it twice is just as good

skygering · 2024-01-09T22:43:49Z

How many vertices are you thinking? I can test it.

rafaqz · 2024-01-10T09:18:50Z

Maybe a FeatureCollection/vector of polygons with thousands-millions of points?

skygering · 2024-01-10T22:04:53Z

Can you do me a favor and run this?

import LibGEOS as LG
import GeoInterface as GI
c = [[[1442.811638252033, 37931.54281488228], [1242.0321008084024, 37650.54139544157], [519.3606654818966, 37530.20857244596], [378.859955761543, 37630.59834116777], [258.4147165404589, 37610.542870668505], [-92.8370577604245, 37861.51729247304], [-112.89252825969345, 37981.96253169413], [-393.8939477004001, 38182.74206913776], [-413.949418199669, 38303.18730835884], [-624.7004827801991, 38453.77196144157], [-745.1457220012833, 38433.7164909423], [-815.3960768614602, 38483.91137530321], [-835.4515473607294, 38604.35661452429], [-975.9522570810825, 38704.74638324611], [-1076.229609577428, 39306.972579351524], [-975.8398408556127, 39447.47328907188], [-995.8953113548819, 39567.91852829296], [-895.5055426330664, 39708.41923801332], [-915.5610131323356, 39828.8644772344], [-714.7814756887051, 40109.86589667511], [-734.836946187974, 40230.31113589619], [-433.66764002252853, 40651.81326505725], [-473.77858102106666, 40892.70374349942], [-272.99904357743617, 41173.705162940125], [-293.05451407670535, 41294.15040216121], [-192.6647453548901, 41434.65111188156], [-212.72021585415905, 41555.09635110265], [-162.52533149325131, 41625.346705962824], [2366.8246921495183, 42046.51158644748], [2437.0750470096955, 41996.316702086566], [2336.68527828788, 41855.81599236622], [2416.9071602849563, 41374.03503548188], [2557.4078700053096, 41273.64526676006], [2637.629752002386, 40791.864309875724], [2537.2399832805704, 40651.363600155375], [2577.3509242791088, 40410.47312171321], [2476.9611555572938, 40269.97241199285], [2497.016626056563, 40149.52717277176], [2296.237088612932, 39868.52575333106], [2376.4589706100082, 39386.74479644672], [2175.6794331663777, 39105.743377006016], [2215.790374164916, 38864.85289856385], [2115.400605443101, 38724.35218884349], [2135.45607594237, 38603.90694962241], [2035.0663072205548, 38463.406239902055], [2055.1217777198235, 38342.961000680974], [1904.5371246371008, 38132.209936100444], [1663.6466461949326, 38092.0989951019], [1563.2568774731171, 37951.59828538155], [1442.811638252033, 37931.54281488228]]]
poly = LG.Polygon(c)
lg_vals = GI.coordinates(LG.simplify(poly, 100.0))[1]
reduced_npoints = length(lg_vals)

Locally I am getting 16, which is what I expect based on my simplify code, but when I run the tests I am getting 15 points. It is deleting the first point. And for the life of me, I can’t understand what is wrong. I put a @show in the tests so that you can see what is going on. I think the versions are the same. I can't figure out what is different. I even copied the above coordinates from a @show call.

skygering · 2024-01-10T23:47:47Z

And I think we should keep the pre-filtering. I made a GeometryCollection of 500 LineStrings that each have 1e6 points and the results are pretty different for included pre-filtering versus an explicit call to another algorithm before calling the "main" algorithm.

I will say that while both of these return the same results (with RadialDistance as a pre-filter for DouglasPeucker) they do not return the same results as just DouglasPeucker. I think this is due to radial distance removing some points that are "off-limits" to remove in DouglasPeucker. I am not sure if it should, but I think that is out of scope of this PR.

skygering · 2024-01-11T18:24:31Z

Can you do me a favor and run this?

import LibGEOS as LG
import GeoInterface as GI
c = [[[1442.811638252033, 37931.54281488228], [1242.0321008084024, 37650.54139544157], [519.3606654818966, 37530.20857244596], [378.859955761543, 37630.59834116777], [258.4147165404589, 37610.542870668505], [-92.8370577604245, 37861.51729247304], [-112.89252825969345, 37981.96253169413], [-393.8939477004001, 38182.74206913776], [-413.949418199669, 38303.18730835884], [-624.7004827801991, 38453.77196144157], [-745.1457220012833, 38433.7164909423], [-815.3960768614602, 38483.91137530321], [-835.4515473607294, 38604.35661452429], [-975.9522570810825, 38704.74638324611], [-1076.229609577428, 39306.972579351524], [-975.8398408556127, 39447.47328907188], [-995.8953113548819, 39567.91852829296], [-895.5055426330664, 39708.41923801332], [-915.5610131323356, 39828.8644772344], [-714.7814756887051, 40109.86589667511], [-734.836946187974, 40230.31113589619], [-433.66764002252853, 40651.81326505725], [-473.77858102106666, 40892.70374349942], [-272.99904357743617, 41173.705162940125], [-293.05451407670535, 41294.15040216121], [-192.6647453548901, 41434.65111188156], [-212.72021585415905, 41555.09635110265], [-162.52533149325131, 41625.346705962824], [2366.8246921495183, 42046.51158644748], [2437.0750470096955, 41996.316702086566], [2336.68527828788, 41855.81599236622], [2416.9071602849563, 41374.03503548188], [2557.4078700053096, 41273.64526676006], [2637.629752002386, 40791.864309875724], [2537.2399832805704, 40651.363600155375], [2577.3509242791088, 40410.47312171321], [2476.9611555572938, 40269.97241199285], [2497.016626056563, 40149.52717277176], [2296.237088612932, 39868.52575333106], [2376.4589706100082, 39386.74479644672], [2175.6794331663777, 39105.743377006016], [2215.790374164916, 38864.85289856385], [2115.400605443101, 38724.35218884349], [2135.45607594237, 38603.90694962241], [2035.0663072205548, 38463.406239902055], [2055.1217777198235, 38342.961000680974], [1904.5371246371008, 38132.209936100444], [1663.6466461949326, 38092.0989951019], [1563.2568774731171, 37951.59828538155], [1442.811638252033, 37931.54281488228]]]
poly = LG.Polygon(c)
lg_vals = GI.coordinates(LG.simplify(poly, 100.0))[1]
reduced_npoints = length(lg_vals)

Locally I am getting 16, which is what I expect based on my simplify code, but when I run the tests I am getting 15 points. It is deleting the first point. And for the life of me, I can’t understand what is wrong. I put a @show in the tests so that you can see what is going on. I think the versions are the same. I can't figure out what is different. I even copied the above coordinates from a @show call.

Okay, I think I know what is going on. I don't know how I was getting 16 yesterday, but I am getting 15 now... All of the examples of the algorithm that I have seen so far say that the first and the last points are ALWAYS included. However, I am wondering if this is just for LineStringsand LibGEOS is also testing if the first point should be included... I am not sure how to handle that given that we need to start at a given point... maybe we just test at the end if it should be included?

I will play around with it. If I can't figure it out quickly, maybe we can just merge how it is and open an issue?

Yep - found it: https://github.com/locationtech/jts/blob/master/modules/core/src/main/java/org/locationtech/jts/simplify/DouglasPeuckerLineSimplifier.java#L81

rafaqz · 2024-01-11T21:45:00Z

Yeah lets merge, this a huge improvement as-is.

Well I guess we should fix that test or mark it as broken first!

skygering · 2024-01-11T23:14:56Z

Okay, I fixed the previous error so that the endpoints are removed if needed.

I have now discovered another difference from LibGEOS. I am going to try to debug it but if I can't figure it out by tonight then I will push, merge, and open an issue.

skygering · 2024-01-12T02:09:04Z

Okay, we don't get the exact same thing as LibGEOS when there are two points in a range that are the same distance from the line. That happens a surprising amount on the test polygons I am using, probably because of how they were generated. But otherwise, we get the same thing. For example,

The two points with the orange lines are the 106th and 107th points of the shape. They are the same distance from the line formed by the 105 and 109th points. In our algorithm, we keep with 106th point. In LibGEOS, they keep the 107th point, which also then leads to keeping the 108th point.

I tried switching the maximum distance finder to mark a new "maximum distance" point if the distance is greater than or equal to the current max distance. However, this broke another test case. I feel pretty confident that this doesn't significantly change the outcome of the algorithm so I am going to merge.

Do you think I still need to open an issue about this @rafaqz?

rafaqz · 2024-01-12T08:53:29Z

Maybe we should note in the docs that we dont promise anything about the order of removal of equal distance points.

rafaqz · 2024-01-12T10:11:25Z

Also, am I right to understand this is now 8x faster than LibGEOS?

skygering · 2024-01-12T17:32:04Z

Not quite 8x. I think things got a bit slower when I added on the checks at the end for polygons. Things can probably still be optimized. It also seems to depend on the size of the polygon. So for a polygon with 35 points going to ~16 points we are 5x faster, while for a polygon with 146 points goint to ~53 points we are 3x faster (but also removing more points so...).

rafaqz · 2024-01-12T18:08:28Z

Ok, still pretty good though. And over collections it will be threaded!

It made me think about that juliacon talk, if you want to do it! Could be a good motivator to benchmark everything and get it registered

rafaqz · 2024-01-12T18:11:49Z

Merge whenever you like.

After that I might get out the profiler and see whats left to squeeze out of some of these methods.

skygering added 3 commits January 3, 2024 09:32

Update SimplifyAlgs constructors

2eef3b0

Start recursive DP algorithm

708bbef

Finish recursive algorithm and add test shapes

6bcd9c3

skygering requested a review from rafaqz January 4, 2024 21:58

rafaqz reviewed Jan 5, 2024

View reviewed changes

skygering and others added 4 commits January 8, 2024 11:27

Add prefilter argument

36cfd73

Prefilter call works

5a87355

Iterative Douglas working

97adad6

Clean up and comment

fedc867

Turn on tests

8767974

Added simplify tests

dc445f8

Fix repeated last point bug

d872f8a

skygering marked this pull request as ready for review January 12, 2024 02:25

Update comments and cleanup tests

5518745

skygering merged commit ba1c7ce into JuliaGeo:main Jan 12, 2024
3 checks passed

skygering deleted the fix_simplifies branch January 12, 2024 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix simplifies #44

Fix simplifies #44

skygering commented Jan 4, 2024 •

edited

Loading

skygering commented Jan 4, 2024

rafaqz commented Jan 4, 2024 •

edited

Loading

rafaqz left a comment

rafaqz Jan 5, 2024

skygering Jan 5, 2024

rafaqz Jan 5, 2024 •

edited

Loading

rafaqz Jan 5, 2024

rafaqz Jan 5, 2024

skygering Jan 5, 2024

rafaqz Jan 5, 2024 •

edited

Loading

rafaqz Jan 5, 2024 •

edited

Loading

skygering Jan 5, 2024

skygering Jan 5, 2024

rafaqz Jan 5, 2024 •

edited

Loading

skygering commented Jan 9, 2024

rafaqz commented Jan 9, 2024 •

edited

Loading

skygering commented Jan 9, 2024

rafaqz commented Jan 10, 2024

skygering commented Jan 10, 2024

skygering commented Jan 10, 2024

skygering commented Jan 11, 2024 •

edited

Loading

rafaqz commented Jan 11, 2024 •

edited

Loading

skygering commented Jan 11, 2024

skygering commented Jan 12, 2024 •

edited

Loading

rafaqz commented Jan 12, 2024 •

edited

Loading

rafaqz commented Jan 12, 2024

skygering commented Jan 12, 2024

rafaqz commented Jan 12, 2024

rafaqz commented Jan 12, 2024

Fix simplifies #44

Fix simplifies #44

Conversation

skygering commented Jan 4, 2024 • edited Loading

skygering commented Jan 4, 2024

rafaqz commented Jan 4, 2024 • edited Loading

rafaqz left a comment

Choose a reason for hiding this comment

rafaqz Jan 5, 2024

Choose a reason for hiding this comment

skygering Jan 5, 2024

Choose a reason for hiding this comment

rafaqz Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

rafaqz Jan 5, 2024

Choose a reason for hiding this comment

rafaqz Jan 5, 2024

Choose a reason for hiding this comment

skygering Jan 5, 2024

Choose a reason for hiding this comment

rafaqz Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

rafaqz Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

skygering Jan 5, 2024

Choose a reason for hiding this comment

skygering Jan 5, 2024

Choose a reason for hiding this comment

rafaqz Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

skygering commented Jan 9, 2024

rafaqz commented Jan 9, 2024 • edited Loading

skygering commented Jan 9, 2024

rafaqz commented Jan 10, 2024

skygering commented Jan 10, 2024

skygering commented Jan 10, 2024

skygering commented Jan 11, 2024 • edited Loading

rafaqz commented Jan 11, 2024 • edited Loading

skygering commented Jan 11, 2024

skygering commented Jan 12, 2024 • edited Loading

rafaqz commented Jan 12, 2024 • edited Loading

rafaqz commented Jan 12, 2024

skygering commented Jan 12, 2024

rafaqz commented Jan 12, 2024

rafaqz commented Jan 12, 2024

skygering commented Jan 4, 2024 •

edited

Loading

rafaqz commented Jan 4, 2024 •

edited

Loading

rafaqz Jan 5, 2024 •

edited

Loading

rafaqz Jan 5, 2024 •

edited

Loading

rafaqz Jan 5, 2024 •

edited

Loading

rafaqz Jan 5, 2024 •

edited

Loading

rafaqz commented Jan 9, 2024 •

edited

Loading

skygering commented Jan 11, 2024 •

edited

Loading

rafaqz commented Jan 11, 2024 •

edited

Loading

skygering commented Jan 12, 2024 •

edited

Loading

rafaqz commented Jan 12, 2024 •

edited

Loading