forked from matomo-org/device-detector
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds detection for various bots (matomo-org#7739)
* Add another user agent for Qwantify * Add test for PagePeeker * Add another test for SemrushBot * Improves DuckDuckBot * Adds detection for DuckAssistBot * Adds detection for RedekenBot * Adds detection for semaltbot * Adds detection for MakeMerryBot * Adds detection for Timpibot * Add generic bot test * Adds detection for ValidBot * Adds detection for NameProtect * Adds detection for CLASSLA-web * Add generic bot test * Improves detection for generic bots * Move heritrix at the bottom * Fix Arquivo.pt test * Adds detection for Domain Codex * Adds detection for Swisscows Favicons * Adds detection for leak.info * Adds detection for Workona * Adds detection for Bloglines * Improves detection for generic bots * Adds detection for Marginalia * Adds detection for VU Server Health Scanner * Improves detection for generic bots * Improves detection for generic bots * Improves detection for generic bots * Adds detection for Functionize * Adds detection for Prerender --------- Co-authored-by: Tutik Alexsandr <[email protected]>
- Loading branch information
1 parent
6da4f09
commit 67b225e
Showing
4 changed files
with
388 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -831,18 +831,27 @@ | |
- | ||
user_agent: DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html) | ||
bot: | ||
name: DuckDuckGo Bot | ||
name: DuckDuckBot | ||
category: Search bot | ||
url: https://duckduckgo.com/duckduckbot | ||
url: https://duckduckgo.com/duckduckgo-help-pages/results/duckduckbot/ | ||
producer: | ||
name: DuckDuckGo | ||
url: https://duckduckgo.com/ | ||
- | ||
user_agent: Mozilla/5.0 (compatible; DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com) | ||
bot: | ||
name: DuckDuckGo Bot | ||
name: DuckDuckBot | ||
category: Search bot | ||
url: https://duckduckgo.com/duckduckbot | ||
url: https://duckduckgo.com/duckduckgo-help-pages/results/duckduckbot/ | ||
producer: | ||
name: DuckDuckGo | ||
url: https://duckduckgo.com/ | ||
- | ||
user_agent: DuckAssistBot/1.1; (+http://duckduckgo.com/duckassistbot.html) | ||
bot: | ||
name: DuckAssistBot | ||
category: Search bot | ||
url: https://duckduckgo.com/duckduckgo-help-pages/results/duckassistbot/ | ||
producer: | ||
name: DuckDuckGo | ||
url: https://duckduckgo.com/ | ||
|
@@ -2475,7 +2484,16 @@ | |
name: Quora | ||
url: http://www.quora.com | ||
- | ||
user_agent: 'Mozilla/5.0 (compatible; Qwantify/2.2w; +https://www.qwant.com/)/*' | ||
user_agent: Mozilla/5.0 (compatible; Qwantify/2.2w; +https://www.qwant.com/) | ||
bot: | ||
name: Qwantify | ||
category: Crawler | ||
url: https://www.qwant.com/ | ||
producer: | ||
name: Qwant Corporation | ||
url: https://www.qwant.com/ | ||
- | ||
user_agent: Mozilla/5.0 (compatible; Qwantify-prod34997/1.0; +https://help.qwant.com/bot/) | ||
bot: | ||
name: Qwantify | ||
category: Crawler | ||
|
@@ -5063,6 +5081,15 @@ | |
producer: | ||
name: Jožef Stefan Institute | ||
url: https://www.ijs.si/ijsw/JSI | ||
- | ||
user_agent: Mozilla/5.0 (compatible; CLASSLA-web; +https://www.clarin.si/info/classla-web-crawler/) | ||
bot: | ||
name: CLASSLA-web | ||
category: Crawler | ||
url: https://www.clarin.si/info/classla-web-crawler/ | ||
producer: | ||
name: Jožef Stefan Institute | ||
url: https://www.ijs.si/ijsw/JSI | ||
- | ||
user_agent: "Electronic Frontier Foundation's Do Not Track Verifier (for questions or concerns email [email protected])" | ||
bot: | ||
|
@@ -6705,12 +6732,12 @@ | |
- | ||
user_agent: Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling) | ||
bot: | ||
name: Heritrix | ||
name: Arquivo.pt | ||
category: Crawler | ||
url: https://webarchive.jira.com/wiki/display/Heritrix/Heritrix | ||
url: https://sobre.arquivo.pt/en/help/crawling-and-archiving-web-content/ | ||
producer: | ||
name: The Internet Archive | ||
url: https://archive.org | ||
name: FCT|FCCN | ||
url: https://www.fct.pt/ | ||
- | ||
user_agent: Arquivo-web-crawler (compatible; brozzler/1.5 +https://arquivo.pt/faq-crawling) | ||
bot: | ||
|
@@ -7803,3 +7830,209 @@ | |
producer: | ||
name: Meins und Vogel GmbH | ||
url: https://muv.com/ | ||
- | ||
user_agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36 (compatible; PagePeeker/3.0; +https://pagepeeker.com/robots/) | ||
bot: | ||
name: PagePeeker | ||
category: Crawler | ||
url: https://pagepeeker.com/robots/ | ||
producer: | ||
name: PAGEPEEKER SRL | ||
url: https://pagepeeker.com/ | ||
- | ||
user_agent: Mozilla/5.0 (compatible; SemrushBot-SWA/0.1; +http://www.semrush.com/bot.html) | ||
bot: | ||
name: SemrushBot | ||
category: Crawler | ||
url: https://www.semrush.com/bot/ | ||
producer: | ||
name: Semrush Inc. | ||
url: https://www.semrush.com/ | ||
- | ||
user_agent: Mozilla/5.0 (compatible; RedekenBot/0.1; +https://www.redeken.com/bot/) | ||
bot: | ||
name: RedekenBot | ||
category: Crawler | ||
url: https://www.redeken.com/en/help/bot.html | ||
producer: | ||
name: Redeken | ||
url: https://www.redeken.com/ | ||
- | ||
user_agent: semaltbot/0.1 (+http://semalt.net) | ||
bot: | ||
name: semaltbot | ||
category: Crawler | ||
url: https://semalt.net/ | ||
producer: | ||
name: Semalt LP | ||
url: https://semalt.net/ | ||
- | ||
user_agent: Mozilla/5.0 (compatible; MakeMerryBot/1.0; +https://makemerry.app/bots) | ||
bot: | ||
name: MakeMerryBot | ||
category: Crawler | ||
url: https://makemerry.app/bots | ||
- | ||
user_agent: Timpibot/0.9 (+http://www.timpi.io) | ||
bot: | ||
name: Timpibot | ||
category: Crawler | ||
url: https://timpi.io/ | ||
producer: | ||
name: Timpi Inc. | ||
url: https://timpi.io/ | ||
- | ||
user_agent: Mozilla/5.0 (compatible; Timpibot/0.8; +http://www.timpi.io) | ||
bot: | ||
name: Timpibot | ||
category: Crawler | ||
url: https://timpi.io/ | ||
producer: | ||
name: Timpi Inc. | ||
url: https://timpi.io/ | ||
- | ||
user_agent: 'Tublm.com/Bot/fubpdfdotcom/Bot/Bot -❤️- +https://tublm.com/game/2048_merge' | ||
bot: | ||
name: Generic Bot | ||
- | ||
user_agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.1 Safari/605.1.15 (compatible; Validbot; +https://www.validbot.com) | ||
bot: | ||
name: ValidBot | ||
category: Crawler | ||
url: https://www.validbot.com/ | ||
producer: | ||
name: Jake Olefsky LLC | ||
url: https://www.validbot.com/ | ||
- | ||
user_agent: NPBot | ||
bot: | ||
name: NameProtectBot | ||
category: Crawler | ||
url: https://www.cscglobal.com/cscglobal/home/ | ||
producer: | ||
name: NameProtect, Inc. | ||
url: https://www.cscglobal.com/ | ||
- | ||
user_agent: Mozilla/5.0 (compatible; CuriousCatgirl Research; +https://curiouscatgirl.cynthia.dev) | ||
bot: | ||
name: Generic Bot | ||
- | ||
user_agent: xx032_bo9vs83_2a | ||
bot: | ||
name: Generic Bot | ||
- | ||
user_agent: Mozilla/5.0 (compatible; heritrix/3.3.0-SNAPSHOT-20160721-2308 +https://www.domaincodex.com) | ||
bot: | ||
name: Domain Codex | ||
category: Crawler | ||
url: https://www.domaincodex.com/ | ||
producer: | ||
name: Erie Data Systems, LLC | ||
url: https://www.eriedatasys.com/ | ||
- | ||
user_agent: Swisscows Favicons | ||
bot: | ||
name: Swisscows Favicons | ||
category: Crawler | ||
url: https://swisscows.com/ | ||
producer: | ||
name: Swisscows AG | ||
url: https://swisscows.com/ | ||
- | ||
user_agent: Mozilla/4.0 (compatible; fluid/0.0; +http://www.leak.info/bot.html) | ||
bot: | ||
name: leak.info | ||
category: Crawler | ||
url: http://www.leak.info/ | ||
- | ||
user_agent: workona-favicon-service/1.0.0 | ||
bot: | ||
name: Workona | ||
category: Crawler | ||
url: https://workona.com/ | ||
producer: | ||
name: Workona, Inc. | ||
url: https://workona.com/ | ||
- | ||
user_agent: Bloglines/3.1 (http://www.bloglines.com) | ||
bot: | ||
name: Bloglines | ||
category: Crawler | ||
url: https://web.archive.org/web/20140309033202/http://www.bloglines.com/ | ||
producer: | ||
name: Reply!, Inc. | ||
url: https://www.reply.com/ | ||
- | ||
user_agent: 'shadowforce.io - sslshed/0.1' | ||
bot: | ||
name: Generic Bot | ||
- | ||
user_agent: search.marginalia.nu | ||
bot: | ||
name: Marginalia | ||
category: Crawler | ||
url: https://www.marginalia.nu/marginalia-search/for-webmasters/ | ||
producer: | ||
name: Marginalia | ||
url: https://www.marginalia.nu/ | ||
- | ||
user_agent: Mozilla/5.0 (compatible;vu-server-health-scanner/1.0;https://130.37.198.75/index.html) | ||
bot: | ||
name: VU Server Health Scanner | ||
category: Security Checker | ||
url: https://130.37.198.75/index.html | ||
producer: | ||
name: VU Amsterdam | ||
url: https://vu.nl/en | ||
- | ||
user_agent: Searcherxweb | ||
bot: | ||
name: Generic Bot | ||
- | ||
user_agent: Mozilla/5.0 (platform; rv:geckoversion) Gecko/geckotrail Firefox/firefoxversion | ||
bot: | ||
name: Generic Bot | ||
- | ||
user_agent: Report Runner | ||
bot: | ||
name: Generic Bot | ||
- | ||
user_agent: Node.js | ||
bot: | ||
name: Generic Bot | ||
- | ||
user_agent: Mozilla/5.0 (X11; Windows x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Functionize | ||
bot: | ||
name: Functionize | ||
category: Crawler | ||
url: https://www.functionize.com/ | ||
producer: | ||
name: Functionize, Inc. | ||
url: https://www.functionize.com/ | ||
- | ||
user_agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/W.X.Y.Z Safari/537.36 Prerender (+https://github.com/prerender/prerender) | ||
bot: | ||
name: Prerender | ||
category: Crawler | ||
url: https://docs.prerender.io/docs/33-overview-of-prerender-crawlers | ||
producer: | ||
name: saas.group Inc. | ||
url: https://saas.group/ | ||
- | ||
user_agent: Mozilla/5.0 (Linux; Android 11; Pixel 5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 Prerender (+https://github.com/prerender/prerender) | ||
bot: | ||
name: Prerender | ||
category: Crawler | ||
url: https://docs.prerender.io/docs/33-overview-of-prerender-crawlers | ||
producer: | ||
name: saas.group Inc. | ||
url: https://saas.group/ | ||
- | ||
user_agent: Prerender (+https://github.com/prerender/prerender) | ||
bot: | ||
name: Prerender | ||
category: Crawler | ||
url: https://docs.prerender.io/docs/33-overview-of-prerender-crawlers | ||
producer: | ||
name: saas.group Inc. | ||
url: https://saas.group/ |
Oops, something went wrong.