-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix simplifies #44
Fix simplifies #44
Conversation
We could also get this in so that at least it works with |
Good catch. I mostly adapted some elses implementation for that 😅 Would be good to keep the number and ratio. Its pretty slick specifying those. You can do it iteratively like a root finder... (to clarify we can merge this and add it back later) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing my horrible errors. It would be good to find some known correct algs to test against
src/transformations/simplify.jl
Outdated
end | ||
#= For each algorithm, apply simplication to all curves, multipoints, and | ||
points, reconstructing everything else around them. =# | ||
_simplify(alg::SimplifyAlg, data; kw...) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function layout from a style guide? I normally write roughly Blue Style. Can totally change that but it would be good to agree on one, I would habitually change it back to the above!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just use equals when it is just one function equal to another. But I see in the BlueSky style guide (new to me!) that that isn't preferred unless it fits on one line.
I guess my logic is, is we are literally just renaming the function, just like in lines 101 and 103 and those were written the exact same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah style guides are never quite what you want but it can simplify some things in collaborations like this. There's a more common standard julia stule guide somewhere too Im easy either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not so concise:
https://docs.julialang.org/en/v1/manual/style-guide/
But I think its encoded in JuliaFormatter more precisely as the default:
https://domluna.github.io/JuliaFormatter.jl/stable/style/
end | ||
return result | ||
end | ||
|
||
_remove!(s, i) = s[i:end-1] .= s[i+1:end] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best not to mix changes and moves. We should follow ColPrac, this is a guidline:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the future would you prefer that this be a new PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks that would be nice! But I also posted ColPrac to distance this request from being just a personal preference. See:
PRs that move code should not also change code, so that they are easier to review.
If only moving code, review for correctness is not required.
If only changing code, then the diff makes it clear what lines have changed.
PRs with large improvements to style should not also change functionality.
This is to avoid making large diffs that are not the focus of the PR.
While it is often helpful to fix a few typos in comments on the way past, it is different from using a regex or formatter on the whole project to fix spacing around operators.
Following ColPrac (roughly) also makes it easier to point out my own bad coding practices
previous = point | ||
end | ||
## Never remove the end points | ||
distances[begin] = distances[end] = Inf | ||
## This avoids taking the square root of each distance above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for doing this here is that radial distance can be used to prefilter for the other algorithms. So squaring before makes it unclear if its squared already, or multiplied by 2 in VisvilinghamWyatt.
(basically the prefiltering was half finished code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this for me is that if you make an object
alg = RadialDistance(; tol = 1.5)
and you have two polygons poly1
and poly2
, I would expect to be able to do the following:
simplify(alg, poly1)
simplify(alg, poly2)
But in this case, the tolerance would be squared twice, right? I would rather have a converter to take a DouglasPeucker
or a VisvalingamWhyatt
algorithm to a RadialDistance
algorithm and it know to either square it or halve if or whatever and then pass that to use the pre-filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it might make sense for them to just provide a pre-filtering algorithm if they so desire with an optional prefilter_alg
argument. This would increase the flexibility and also would be more consistent than using tol
since the tol
for VisvalingamWhyatt
is an area while the tol
for both of the other two is a distance which would cause problems if you try to prefilter VisvalingamWhyatt
using RadialDistance
with the same alg
object. This would then allow a different tolerance/number/ratio for the pre-filtering than the actual simplification algorithm and would let people hook into the architecture easier. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah thats a better idea.
I think its worth having if you have huge detailed polygons, as RadialDistance should be so much faster.
I added in the pre-filter algorithm and I have also re-written the I would like to add a few more tests, but then I think this is good to go. How do you feel about the new pre-filtering algorithm? My only concern is that users might be confused by it. Like why not just run |
Maybe we should try it on some huge polygons and see if its worth it, or if just running it twice is just as good |
How many vertices are you thinking? I can test it. |
Maybe a FeatureCollection/vector of polygons with thousands-millions of points? |
Can you do me a favor and run this? import LibGEOS as LG
import GeoInterface as GI
c = [[[1442.811638252033, 37931.54281488228], [1242.0321008084024, 37650.54139544157], [519.3606654818966, 37530.20857244596], [378.859955761543, 37630.59834116777], [258.4147165404589, 37610.542870668505], [-92.8370577604245, 37861.51729247304], [-112.89252825969345, 37981.96253169413], [-393.8939477004001, 38182.74206913776], [-413.949418199669, 38303.18730835884], [-624.7004827801991, 38453.77196144157], [-745.1457220012833, 38433.7164909423], [-815.3960768614602, 38483.91137530321], [-835.4515473607294, 38604.35661452429], [-975.9522570810825, 38704.74638324611], [-1076.229609577428, 39306.972579351524], [-975.8398408556127, 39447.47328907188], [-995.8953113548819, 39567.91852829296], [-895.5055426330664, 39708.41923801332], [-915.5610131323356, 39828.8644772344], [-714.7814756887051, 40109.86589667511], [-734.836946187974, 40230.31113589619], [-433.66764002252853, 40651.81326505725], [-473.77858102106666, 40892.70374349942], [-272.99904357743617, 41173.705162940125], [-293.05451407670535, 41294.15040216121], [-192.6647453548901, 41434.65111188156], [-212.72021585415905, 41555.09635110265], [-162.52533149325131, 41625.346705962824], [2366.8246921495183, 42046.51158644748], [2437.0750470096955, 41996.316702086566], [2336.68527828788, 41855.81599236622], [2416.9071602849563, 41374.03503548188], [2557.4078700053096, 41273.64526676006], [2637.629752002386, 40791.864309875724], [2537.2399832805704, 40651.363600155375], [2577.3509242791088, 40410.47312171321], [2476.9611555572938, 40269.97241199285], [2497.016626056563, 40149.52717277176], [2296.237088612932, 39868.52575333106], [2376.4589706100082, 39386.74479644672], [2175.6794331663777, 39105.743377006016], [2215.790374164916, 38864.85289856385], [2115.400605443101, 38724.35218884349], [2135.45607594237, 38603.90694962241], [2035.0663072205548, 38463.406239902055], [2055.1217777198235, 38342.961000680974], [1904.5371246371008, 38132.209936100444], [1663.6466461949326, 38092.0989951019], [1563.2568774731171, 37951.59828538155], [1442.811638252033, 37931.54281488228]]]
poly = LG.Polygon(c)
lg_vals = GI.coordinates(LG.simplify(poly, 100.0))[1]
reduced_npoints = length(lg_vals) Locally I am getting 16, which is what I expect based on my simplify code, but when I run the tests I am getting 15 points. It is deleting the first point. And for the life of me, I can’t understand what is wrong. I put a @show in the tests so that you can see what is going on. I think the versions are the same. I can't figure out what is different. I even copied the above coordinates from a |
Okay, I think I know what is going on. I don't know how I was getting 16 yesterday, but I am getting 15 now... All of the examples of the algorithm that I have seen so far say that the first and the last points are ALWAYS included. However, I am wondering if this is just for I will play around with it. If I can't figure it out quickly, maybe we can just merge how it is and open an issue? Yep - found it: https://github.com/locationtech/jts/blob/master/modules/core/src/main/java/org/locationtech/jts/simplify/DouglasPeuckerLineSimplifier.java#L81 |
Yeah lets merge, this a huge improvement as-is. Well I guess we should fix that test or mark it as broken first! |
Okay, I fixed the previous error so that the endpoints are removed if needed. I have now discovered another difference from LibGEOS. I am going to try to debug it but if I can't figure it out by tonight then I will push, merge, and open an issue. |
Okay, we don't get the exact same thing as LibGEOS when there are two points in a range that are the same distance from the line. That happens a surprising amount on the test polygons I am using, probably because of how they were generated. But otherwise, we get the same thing. For example, I tried switching the maximum distance finder to mark a new "maximum distance" point if the distance is greater than or equal to the current max distance. However, this broke another test case. I feel pretty confident that this doesn't significantly change the outcome of the algorithm so I am going to merge. Do you think I still need to open an issue about this @rafaqz? |
Maybe we should note in the docs that we dont promise anything about the order of removal of equal distance points. |
Also, am I right to understand this is now 8x faster than LibGEOS? |
Ok, still pretty good though. And over collections it will be threaded! It made me think about that juliacon talk, if you want to do it! Could be a good motivator to benchmark everything and get it registered |
Merge whenever you like. After that I might get out the profiler and see whats left to squeeze out of some of these methods. |
The simplification function wasn't working for
DouglasPeucker
with thetol
keyword used. I found that this was because although the algorithm was using squared distance, it wasn't squaring the tolerance within the algorithm so no points were being removed.I decided to move the adaptation of the tolerance values into the constructors so that the same
SimplifyAlg
object could be used more than once without the tolerance being changed each time that the function was run. This seems important since for the algorithms other thanDouglasPeucker
you do need to explicitly use the object. I also swapped to using@kwdef
(is it exported yet?) to get rid of the constructor that does exactly what@kwdef
does.While I was doing this, I noticed that the
DouglasPeucker
algorithm was wrong. It starts at the "bottom" and finds the smallest tolerances between adjacent edges and then removes the smallest and moves "up" the shape. The actualDouglasPeucker
algorithm finds edges with the largest distances and marks those to be kept and then moves "down" the shape. You can see a gif of this here. This gives different, but similar results since inDouglasPeucker
there are some points that can't be removed that can be removed with the previous implementation. I implemented it using recursion, but I am not sure how to make it work for thenumber
andratio
keywords. I will think about it, but I would love any suggestions.Here is the before performance in
LibGEOS
and withGeometryOps
(with LG and GI polygons of 35 points reduced to 17 points) forDouglasPeucker
:And here is the new performance (I think it is faster because we can stop earlier):
I also added a file of really wacky polygons that I use in my work for future testing.