You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 29, 2019. It is now read-only.
Currently the constructor to Archiver takes two arguments: UUID and url. I'm wondering if they're still relevant.
The UUID is a holdover from the archivers.space workflow - I'm not sure if there's an analog in DT? I think we can just remove this.
Also, in #5, we discussed that a custom crawl could span multiple urls. This raises a couple questions:
Does it still make sense for a scraper to be tied to a 'root' url?
How do we distinguish between "urls that this scraper takes data from" and "child URLs of pages linked to from this page"
The text was updated successfully, but these errors were encountered:
I'm not sure the UUID is as relevant here as it was in the archivers-space since we are using the archivers tool on either child urls or byte files,rather than individual url pages. Maybe the UUID should be the scraper root url or removed altogether.
In regards to custom crawlers spawning multiple urls it seems that a scraper has to begin at some url. Maybe that's our root url as a reference for future scraper runs and gets set with the archiver tool initialization?
For distinguishing the type of url collected maybe data collection urls should be passed inthe add data function and child url are maintained through the add url function?
So, you know, it's only been four months, but yes UUID's should be ignored whenever possible. I'd favor hashes for blob content and urls for anything that has a clear association to... a URL.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Currently the constructor to Archiver takes two arguments: UUID and url. I'm wondering if they're still relevant.
The UUID is a holdover from the archivers.space workflow - I'm not sure if there's an analog in DT? I think we can just remove this.
Also, in #5, we discussed that a custom crawl could span multiple urls. This raises a couple questions:
The text was updated successfully, but these errors were encountered: