active-search-engines |
list of active general purpose search engine names from https://wikipedia.org/wiki/Template:Web_search_engines |
alexa-top1mil-sites |
Alexa list of top 1 million web sites |
amazon-aws-namespaces |
AWS name spaces (paths found in aws.amazon.com URL's) |
amazon-macie-types |
Amazon Macie data object content types via https://docs.aws.amazon.com/macie/latest/userguide/macie-classify-objects-content-type.html |
censorship-test-urls |
URL testing list intended for discovering web site censorship https://github.com/citizenlab/test-lists |
content-access-guidelines |
Web Content Accessibility Guidelines by W3C |
free-web-hosts |
list of free web hosting services from https://mirror1.malwaredomains.com/files/freewebhosts.txt |
github-dmca-users |
links to GitHub accounts that have received DMCA notices https://github.com/github/dmca |
marketing-tech-landscape |
top 5,000 marketing technology web sites |
modern-web-history |
A History of The Modern Web |
phishtank-developers-database |
PhishTank downloadable database in CSV format via https://phishtank.com/developer_info.php |
piidox-search-sites |
list of personally identifiable information search engines |
simpl-redir-shortcuts |
shortcuts for redirection on simpl.info |
sites-using-cloudflare |
sites using CloudFlare WAF according to GitHub @pirate |
subreddit-list-full |
http://www.reddit.com/r/ListOfSubreddits/wiki/listofsubreddits |
subreddit-list-nsfw |
WARNING! NSFW Same as above, but with "not-safe-for-work" subreddit materials |
tls-scanner-urls |
URL's to test TLS scanning on via Botan |
top-sites-global |
Top 1,000 Internet web sites across the globe by OWASP headers |
url-shortener-sites |
URL shortener sites taken from http://dns-bh.sagadc.org/url_shorteners.txt |