Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spider.HTML.find_links should not collect links to headers #170

Open
nappex opened this issue Feb 5, 2023 · 1 comment
Open

Spider.HTML.find_links should not collect links to headers #170

nappex opened this issue Feb 5, 2023 · 1 comment
Assignees

Comments

@nappex
Copy link
Collaborator

nappex commented Feb 5, 2023

in href we can specify a link to some part of page itself for example headers - You can use href="#top" or href="#" to link to the top of the current page!

But it is not valid link which create valid another page we want to crawl, this kind of page was already crawled.

@Glutexo
Copy link
Owner

Glutexo commented Feb 5, 2023

Shouldn’t it? Although I can’t come up with a real example, there may be a use case. That calls for a switch allowing to enable links without path (and domain, protocol…).

@Glutexo Glutexo assigned Glutexo and nappex and unassigned Glutexo Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants