Block cohere-training-data-crawler

This commit is contained in:
Helen Chong 2025-01-05 15:53:45 +08:00
parent 4007612a47
commit 26f946d430
2 changed files with 2 additions and 1 deletions

View File

@ -20,5 +20,5 @@ RewriteRule ^.+$ index.php [L]
</IfModule> </IfModule>
# Block bad bots # Block bad bots
RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot-Extended|Bytespider|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo.*Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|Sidetrade.*indexer.*bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot) [NC] RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot-Extended|Bytespider|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo.*Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|Sidetrade.*indexer.*bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot) [NC]
RewriteRule .* https://nocommercialuse.org/ [L] RewriteRule .* https://nocommercialuse.org/ [L]

View File

@ -11,6 +11,7 @@ User-agent: ChatGPT-User
User-agent: Claude-Web User-agent: Claude-Web
User-agent: ClaudeBot User-agent: ClaudeBot
User-agent: cohere-ai User-agent: cohere-ai
User-agent: cohere-training-data-crawler
User-agent: Diffbot User-agent: Diffbot
User-agent: DuckAssistBot User-agent: DuckAssistBot
User-agent: FacebookBot User-agent: FacebookBot