mirror of
https://github.com/TecharoHQ/anubis.git
synced 2025-08-03 09:48:08 -04:00
Bump AI-robots.txt rules to version 1.31 (#538)
* Bump AI-robots.txt rules to version 1.31 * chore: spelling Signed-off-by: Xe Iaso <me@xeiaso.net> --------- Signed-off-by: Xe Iaso <me@xeiaso.net> Co-authored-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
parent
c78d830ecb
commit
11081aac08
@ -20,9 +20,6 @@
|
||||
# https://twitter.com/nyttypos/status/1898844061873639490
|
||||
#\([A-Z][a-z]{2,}(?: [a-z]+){3,}\)\.\s
|
||||
|
||||
# Complete sentences shouldn't be in the middle of another sentence as a parenthetical.
|
||||
(?<!\.)\.\),
|
||||
|
||||
# Complete sentences in parentheticals should not have a space before the period.
|
||||
\s\.\)(?!.*\}\})
|
||||
|
||||
|
4
.github/actions/spelling/patterns.txt
vendored
4
.github/actions/spelling/patterns.txt
vendored
@ -128,3 +128,7 @@ go install(?:\s+[a-z]+\.[-@\w/.]+)+
|
||||
|
||||
# ignore long runs of a single character:
|
||||
\b([A-Za-z])\g{-1}{3,}\b
|
||||
|
||||
# hit-count: 1 file-count: 1
|
||||
# microsoft
|
||||
\b(?:https?://|)(?:(?:(?:blogs|download\.visualstudio|docs|msdn2?|research)\.|)microsoft|blogs\.msdn)\.co(?:m|\.\w\w)/[-_a-zA-Z0-9()=./%]*
|
@ -1,4 +1,4 @@
|
||||
- name: "ai-robots-txt"
|
||||
user_agent_regex: >-
|
||||
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot
|
||||
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|MistralAI-User/1.0|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YouBot
|
||||
action: DENY
|
||||
|
@ -22,7 +22,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
- Ensure cookie renaming is consistent across configuration options
|
||||
- Add Bookstack app in data
|
||||
- Add `--target-host` flag/envvar to allow changing the value of the Host header in requests forwarded to the target service.
|
||||
- Bump AI-robots.txt to version 1.30 (add QualifiedBot)
|
||||
- Bump AI-robots.txt to version 1.31
|
||||
- Add `RuntimeDirectory` to systemd unit settings so native packages can listen over unix sockets
|
||||
- Added SearXNG instance tracker whitelist policy
|
||||
- Added Qualys SSL Labs whitelist policy
|
||||
|
@ -9,6 +9,8 @@ User-agent: Brightbot 1.0
|
||||
User-agent: Bytespider
|
||||
User-agent: CCBot
|
||||
User-agent: ChatGPT-User
|
||||
User-agent: Claude-SearchBot
|
||||
User-agent: Claude-User
|
||||
User-agent: Claude-Web
|
||||
User-agent: ClaudeBot
|
||||
User-agent: cohere-ai
|
||||
@ -21,6 +23,7 @@ User-agent: FacebookBot
|
||||
User-agent: Factset_spyderbot
|
||||
User-agent: FirecrawlAgent
|
||||
User-agent: FriendlyCrawler
|
||||
User-agent: Google-CloudVertexBot
|
||||
User-agent: Google-Extended
|
||||
User-agent: GoogleOther
|
||||
User-agent: GoogleOther-Image
|
||||
@ -37,6 +40,7 @@ User-agent: meta-externalagent
|
||||
User-agent: Meta-ExternalAgent
|
||||
User-agent: meta-externalfetcher
|
||||
User-agent: Meta-ExternalFetcher
|
||||
User-agent: MistralAI-User/1.0
|
||||
User-agent: NovaAct
|
||||
User-agent: OAI-SearchBot
|
||||
User-agent: omgili
|
||||
@ -55,6 +59,7 @@ User-agent: TikTokSpider
|
||||
User-agent: Timpibot
|
||||
User-agent: VelenPublicWebCrawler
|
||||
User-agent: Webzio-Extended
|
||||
User-agent: wpbot
|
||||
User-agent: YouBot
|
||||
Disallow: /
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user