Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Espejos de las referencias #13

Open
mauriciopasquier opened this issue May 25, 2013 · 7 comments
Open

Espejos de las referencias #13

mauriciopasquier opened this issue May 25, 2013 · 7 comments

Comments

@mauriciopasquier
Copy link
Contributor

Para que siempre tengamos alguna versión de los links disponibles, por si cambian, se caen, etc, podríamos espejarlas en nuestra página o subirlas a la wayback machine si no están ya. Podemos poner los links espejados en la versión en html.

@fauno
Copy link
Member

fauno commented May 27, 2013

+1 a la wayback, se pueden extraer con esto: grep -ro "https\?://[^'\"]\+"

@fauno
Copy link
Member

fauno commented May 27, 2013

eeeh eso es para extraerlos del html, habría que cambiar el grupo [^'\"] para incluir los cierres de markdown/bibtex

@mauriciopasquier
Copy link
Contributor Author

Parece que no hay una forma de archivar los links que tenemos.

How can I get my site included in the Wayback Machine?

Much of our archived web data comes from our own crawls or from Alexa
Internet's crawls. Neither organization has a "crawl my site now!" submission
process. Internet Archive's crawls tend to find sites that are well linked
from other sites. The best way to ensure that we find your web site is to
make sure it is included in online directories and that similar/related sites
link to you.

Alexa Internet uses its own methods to discover sites to crawl. It may be
helpful to install the free Alexa toolbar and visit the site you want crawled
to make sure they know about it.

Regardless of who is crawling the site, you should ensure that your site's
'robots.txt' rules and in-page META robots directives do not tell crawlers to
avoid your site.

When a site is crawled, there is usually at least a 6-month lag, and
sometimes as much as a 24-month lag, between the date that web pages are
crawled and when they appear in the Wayback Machine.

In some cases, crawled content from certain projects may appear in a much
shorter timeframe — as little as a few weeks from when it was crawled.
Older material for the same pages and sites may still appear separately,
months later.

@fauno
Copy link
Member

fauno commented Nov 12, 2013

Mauricio Pasquier Juan [email protected] writes:

Parece que no hay una forma de archivar los links que tenemos.

los archivamos nosotros entonces

:{

@mauriciopasquier
Copy link
Contributor Author

un GET acá:

http://web.archive.org/save/<url>

parece que las archiva :D

@fauno
Copy link
Member

fauno commented Feb 10, 2015

otro: http://amberlink.org/

@mauriciopasquier mauriciopasquier removed their assignment Feb 10, 2015
@mauriciopasquier
Copy link
Contributor Author

Suena muy bien amber

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants