You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, our Spider.HTML.find_links search only for href in a tag. It could be handy if we'll be able to use our find_links with more selectors as a, link or a, link, area, base.
Maybe we should consider option when no selector is specified, it could be as default or as explicit, default could be just a or nothing. If nothing is specified then href is searched everywhere.
definition of find_links
find_links(parsed_document, selectors \\ "")
find_links(parsed_document, selectors \\ "a")
No selectors can be specified as None, NULL, "" or "*"....
Default state could be "a, area"
The text was updated successfully, but these errors were encountered:
In the end, we may not want to filter the tags at all. The tool is intended to be as generic as possible and href attributes on other elements can appear in the real world.
We may however omit link tags or maybe anything in the head, because those are not user-followable links to other pages. Downloading those would only rarely provide any data worth collecting by a spider. But it would make sense to have an option to override this behavior.
We should also check how the base tag works and whether we shouldn’t take it into account when finding links. If I remember corretly, such tag would change the target of relative URLs.
Let’s move on in small steps, not putting all of the logic in place in a single pull request.
Currently, our
Spider.HTML.find_links
search only forhref
ina
tag. It could be handy if we'll be able to use ourfind_links
with more selectors asa, link
ora, link, area, base
.Maybe we should consider option when no selector is specified, it could be as default or as explicit, default could be just
a
or nothing. If nothing is specified thenhref
is searched everywhere.definition of
find_links
find_links(parsed_document, selectors \\ "")
find_links(parsed_document, selectors \\ "a")
No selectors can be specified as
None
,NULL
,""
or"*"
....Default state could be
"a, area"
The text was updated successfully, but these errors were encountered: