mirror of
https://github.com/TecharoHQ/anubis.git
synced 2025-09-09 12:50:42 -04:00
Bump AI-robots.txt rules to version 1.31 (#538)
* Bump AI-robots.txt rules to version 1.31 * chore: spelling Signed-off-by: Xe Iaso <me@xeiaso.net> --------- Signed-off-by: Xe Iaso <me@xeiaso.net> Co-authored-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
parent
c78d830ecb
commit
11081aac08
@ -20,9 +20,6 @@
|
|||||||
# https://twitter.com/nyttypos/status/1898844061873639490
|
# https://twitter.com/nyttypos/status/1898844061873639490
|
||||||
#\([A-Z][a-z]{2,}(?: [a-z]+){3,}\)\.\s
|
#\([A-Z][a-z]{2,}(?: [a-z]+){3,}\)\.\s
|
||||||
|
|
||||||
# Complete sentences shouldn't be in the middle of another sentence as a parenthetical.
|
|
||||||
(?<!\.)\.\),
|
|
||||||
|
|
||||||
# Complete sentences in parentheticals should not have a space before the period.
|
# Complete sentences in parentheticals should not have a space before the period.
|
||||||
\s\.\)(?!.*\}\})
|
\s\.\)(?!.*\}\})
|
||||||
|
|
||||||
|
4
.github/actions/spelling/patterns.txt
vendored
4
.github/actions/spelling/patterns.txt
vendored
@ -128,3 +128,7 @@ go install(?:\s+[a-z]+\.[-@\w/.]+)+
|
|||||||
|
|
||||||
# ignore long runs of a single character:
|
# ignore long runs of a single character:
|
||||||
\b([A-Za-z])\g{-1}{3,}\b
|
\b([A-Za-z])\g{-1}{3,}\b
|
||||||
|
|
||||||
|
# hit-count: 1 file-count: 1
|
||||||
|
# microsoft
|
||||||
|
\b(?:https?://|)(?:(?:(?:blogs|download\.visualstudio|docs|msdn2?|research)\.|)microsoft|blogs\.msdn)\.co(?:m|\.\w\w)/[-_a-zA-Z0-9()=./%]*
|
@ -1,4 +1,4 @@
|
|||||||
- name: "ai-robots-txt"
|
- name: "ai-robots-txt"
|
||||||
user_agent_regex: >-
|
user_agent_regex: >-
|
||||||
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot
|
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|MistralAI-User/1.0|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YouBot
|
||||||
action: DENY
|
action: DENY
|
||||||
|
@ -22,7 +22,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|||||||
- Ensure cookie renaming is consistent across configuration options
|
- Ensure cookie renaming is consistent across configuration options
|
||||||
- Add Bookstack app in data
|
- Add Bookstack app in data
|
||||||
- Add `--target-host` flag/envvar to allow changing the value of the Host header in requests forwarded to the target service.
|
- Add `--target-host` flag/envvar to allow changing the value of the Host header in requests forwarded to the target service.
|
||||||
- Bump AI-robots.txt to version 1.30 (add QualifiedBot)
|
- Bump AI-robots.txt to version 1.31
|
||||||
- Add `RuntimeDirectory` to systemd unit settings so native packages can listen over unix sockets
|
- Add `RuntimeDirectory` to systemd unit settings so native packages can listen over unix sockets
|
||||||
- Added SearXNG instance tracker whitelist policy
|
- Added SearXNG instance tracker whitelist policy
|
||||||
- Added Qualys SSL Labs whitelist policy
|
- Added Qualys SSL Labs whitelist policy
|
||||||
|
@ -9,6 +9,8 @@ User-agent: Brightbot 1.0
|
|||||||
User-agent: Bytespider
|
User-agent: Bytespider
|
||||||
User-agent: CCBot
|
User-agent: CCBot
|
||||||
User-agent: ChatGPT-User
|
User-agent: ChatGPT-User
|
||||||
|
User-agent: Claude-SearchBot
|
||||||
|
User-agent: Claude-User
|
||||||
User-agent: Claude-Web
|
User-agent: Claude-Web
|
||||||
User-agent: ClaudeBot
|
User-agent: ClaudeBot
|
||||||
User-agent: cohere-ai
|
User-agent: cohere-ai
|
||||||
@ -21,6 +23,7 @@ User-agent: FacebookBot
|
|||||||
User-agent: Factset_spyderbot
|
User-agent: Factset_spyderbot
|
||||||
User-agent: FirecrawlAgent
|
User-agent: FirecrawlAgent
|
||||||
User-agent: FriendlyCrawler
|
User-agent: FriendlyCrawler
|
||||||
|
User-agent: Google-CloudVertexBot
|
||||||
User-agent: Google-Extended
|
User-agent: Google-Extended
|
||||||
User-agent: GoogleOther
|
User-agent: GoogleOther
|
||||||
User-agent: GoogleOther-Image
|
User-agent: GoogleOther-Image
|
||||||
@ -37,6 +40,7 @@ User-agent: meta-externalagent
|
|||||||
User-agent: Meta-ExternalAgent
|
User-agent: Meta-ExternalAgent
|
||||||
User-agent: meta-externalfetcher
|
User-agent: meta-externalfetcher
|
||||||
User-agent: Meta-ExternalFetcher
|
User-agent: Meta-ExternalFetcher
|
||||||
|
User-agent: MistralAI-User/1.0
|
||||||
User-agent: NovaAct
|
User-agent: NovaAct
|
||||||
User-agent: OAI-SearchBot
|
User-agent: OAI-SearchBot
|
||||||
User-agent: omgili
|
User-agent: omgili
|
||||||
@ -55,6 +59,7 @@ User-agent: TikTokSpider
|
|||||||
User-agent: Timpibot
|
User-agent: Timpibot
|
||||||
User-agent: VelenPublicWebCrawler
|
User-agent: VelenPublicWebCrawler
|
||||||
User-agent: Webzio-Extended
|
User-agent: Webzio-Extended
|
||||||
|
User-agent: wpbot
|
||||||
User-agent: YouBot
|
User-agent: YouBot
|
||||||
Disallow: /
|
Disallow: /
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user