Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why dose the distance assign as 0? #1

Open
scott198510 opened this issue May 3, 2022 · 1 comment
Open

why dose the distance assign as 0? #1

scott198510 opened this issue May 3, 2022 · 1 comment

Comments

@scott198510
Copy link

    for i in range(0,15):
        max_index = np.argmax(distances)
        i1, i2 = np.unravel_index(max_index, distances.shape)
        distances[i1,i2] = 0.0

Using multiple iterations, the distance is assigned as 0. why?

@Lukas-Justen
Copy link
Owner

Lukas-Justen commented Jun 25, 2022

@scott198510 I am not so sure why we did that. If I remember correctly, the issue was that some clusters still contained points that did not really fit to a line segment because they were "far" away from the actual line. You can consider these points as noise.

As a result, the max distances used to be between these outliers and the other end of the line. I tried to make a small visualization that might help to understand the issue. The orange point is the outlier and the red prototype lines are the ones that are the longest. By discarding the top N lines, the actual line should look much better. Finally, you should be left with the green line.

Screen Shot 2022-06-25 at 6 33 16 PM

I guess there are more sophisticated workarounds to solve that kind of issue. For instance, you could use a linear or polynomial regression model to better approximate the line and ignore outliers. This is also being mentioned within the future work section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants