diff --git a/.github/actions/spelling/line_forbidden.patterns b/.github/actions/spelling/line_forbidden.patterns index 388c60e..c3d77de 100644 --- a/.github/actions/spelling/line_forbidden.patterns +++ b/.github/actions/spelling/line_forbidden.patterns @@ -20,9 +20,6 @@ # https://twitter.com/nyttypos/status/1898844061873639490 #\([A-Z][a-z]{2,}(?: [a-z]+){3,}\)\.\s -# Complete sentences shouldn't be in the middle of another sentence as a parenthetical. -(?- - AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot + AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|MistralAI-User/1.0|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YouBot action: DENY diff --git a/docs/docs/CHANGELOG.md b/docs/docs/CHANGELOG.md index 6215575..fc1dc53 100644 --- a/docs/docs/CHANGELOG.md +++ b/docs/docs/CHANGELOG.md @@ -22,7 +22,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Ensure cookie renaming is consistent across configuration options - Add Bookstack app in data - Add `--target-host` flag/envvar to allow changing the value of the Host header in requests forwarded to the target service. -- Bump AI-robots.txt to version 1.30 (add QualifiedBot) +- Bump AI-robots.txt to version 1.31 - Add `RuntimeDirectory` to systemd unit settings so native packages can listen over unix sockets - Added SearXNG instance tracker whitelist policy - Added Qualys SSL Labs whitelist policy diff --git a/web/static/robots.txt b/web/static/robots.txt index 5c4c748..71fb1f1 100644 --- a/web/static/robots.txt +++ b/web/static/robots.txt @@ -9,6 +9,8 @@ User-agent: Brightbot 1.0 User-agent: Bytespider User-agent: CCBot User-agent: ChatGPT-User +User-agent: Claude-SearchBot +User-agent: Claude-User User-agent: Claude-Web User-agent: ClaudeBot User-agent: cohere-ai @@ -21,6 +23,7 @@ User-agent: FacebookBot User-agent: Factset_spyderbot User-agent: FirecrawlAgent User-agent: FriendlyCrawler +User-agent: Google-CloudVertexBot User-agent: Google-Extended User-agent: GoogleOther User-agent: GoogleOther-Image @@ -37,6 +40,7 @@ User-agent: meta-externalagent User-agent: Meta-ExternalAgent User-agent: meta-externalfetcher User-agent: Meta-ExternalFetcher +User-agent: MistralAI-User/1.0 User-agent: NovaAct User-agent: OAI-SearchBot User-agent: omgili @@ -55,6 +59,7 @@ User-agent: TikTokSpider User-agent: Timpibot User-agent: VelenPublicWebCrawler User-agent: Webzio-Extended +User-agent: wpbot User-agent: YouBot Disallow: /