diff --git a/data/botPolicies.json b/data/botPolicies.json
deleted file mode 100644
index 6227639..0000000
--- a/data/botPolicies.json
+++ /dev/null
@@ -1,29 +0,0 @@
-{
- "bots": [
- {
- "import": "(data)/bots/_deny-pathological.yaml"
- },
- {
- "import": "(data)/meta/ai-block-aggressive.yaml"
- },
- {
- "import": "(data)/crawlers/_allow-good.yaml"
- },
- {
- "import": "(data)/bots/aggressive-brazilian-scrapers.yaml"
- },
- {
- "import": "(data)/common/keep-internet-working.yaml"
- },
- {
- "name": "generic-browser",
- "user_agent_regex": "Mozilla|Opera",
- "action": "CHALLENGE"
- }
- ],
- "dnsbl": false,
- "status_codes": {
- "CHALLENGE": 200,
- "DENY": 200
- }
-}
\ No newline at end of file
diff --git a/data/embed.go b/data/embed.go
index c3ed06f..bef617a 100644
--- a/data/embed.go
+++ b/data/embed.go
@@ -3,6 +3,6 @@ package data
import "embed"
var (
- //go:embed botPolicies.yaml botPolicies.json all:apps all:bots all:clients all:common all:crawlers all:meta
+ //go:embed botPolicies.yaml all:apps all:bots all:clients all:common all:crawlers all:meta
BotPolicies embed.FS
)
diff --git a/docs/docs/CHANGELOG.md b/docs/docs/CHANGELOG.md
index 2f028b1..451ed80 100644
--- a/docs/docs/CHANGELOG.md
+++ b/docs/docs/CHANGELOG.md
@@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Legacy JavaScript code has been eliminated.
- The contact email in the LibreJS header has been changed.
- The hard dependency on WebCrypto has been removed, allowing a proof of work challenge to work over plain (unencrypted) HTTP.
+- The legacy JSON based policy file example has been removed and all documentation for how to write a policy file in JSON has been deleted. JSON based policy files will still work, but YAML is the superior option for Anubis configuration.
### Breaking changes
diff --git a/docs/docs/admin/algorithm-selection.mdx b/docs/docs/admin/algorithm-selection.mdx
deleted file mode 100644
index e5bf962..0000000
--- a/docs/docs/admin/algorithm-selection.mdx
+++ /dev/null
@@ -1,12 +0,0 @@
----
-title: Proof-of-Work Algorithm Selection
----
-
-Anubis offers two proof-of-work algorithms:
-
-- `"fast"`: highly optimized JavaScript that will run as fast as your computer lets it
-- `"slow"`: intentionally slow JavaScript that will waste time and memory
-
-The fast algorithm is used by default to limit impacts on users' computers. Administrators may configure individual bot policy rules to use the slow algorithm in order to make known malicious clients waitloop and do nothing useful.
-
-Generally, you should use the fast algorithm unless you have a good reason not to.
diff --git a/docs/docs/admin/configuration/import.mdx b/docs/docs/admin/configuration/import.mdx
index 13b7992..b8fdd2e 100644
--- a/docs/docs/admin/configuration/import.mdx
+++ b/docs/docs/admin/configuration/import.mdx
@@ -7,25 +7,6 @@ Anubis has the ability to let you import snippets of configuration into the main
EG:
-
-
-
-```json
-{
- "bots": [
- {
- "import": "(data)/bots/ai-catchall.yaml"
- },
- {
- "import": "(data)/bots/cloudflare-workers.yaml"
- }
- ]
-}
-```
-
-
-
-
```yaml
bots:
# Pathological bots to deny
@@ -34,30 +15,8 @@ bots:
- import: (data)/bots/cloudflare-workers.yaml
```
-
-
-
Of note, a bot rule can either have inline bot configuration or import a bot config snippet. You cannot do both in a single bot rule.
-
-
-
-```json
-{
- "bots": [
- {
- "import": "(data)/bots/ai-catchall.yaml",
- "name": "generic-browser",
- "user_agent_regex": "Mozilla|Opera\n",
- "action": "CHALLENGE"
- }
- ]
-}
-```
-
-
-
-
```yaml
bots:
- import: (data)/bots/ai-catchall.yaml
@@ -67,9 +26,6 @@ bots:
action: CHALLENGE
```
-
-
-
This will return an error like this:
```text
@@ -83,30 +39,11 @@ Paths can either be prefixed with `(data)` to import from the [the data folder i
You can also import from an imported file in case you want to import an entire folder of rules at once.
-
-
-
-```json
-{
- "bots": [
- {
- "import": "(data)/bots/_deny-pathological.yaml"
- }
- ]
-}
-```
-
-
-
-
```yaml
bots:
- import: (data)/bots/_deny-pathological.yaml
```
-
-
-
This lets you import an entire ruleset at once:
```yaml
@@ -124,22 +61,6 @@ Snippets can be written in either JSON or YAML, with a preference for YAML. When
Here is an example snippet that allows [IPv6 Unique Local Addresses](https://en.wikipedia.org/wiki/Unique_local_address) through Anubis:
-
-
-
-```json
-[
- {
- "name": "ipv6-ula",
- "action": "ALLOW",
- "remote_addresses": ["fc00::/7"]
- }
-]
-```
-
-
-
-
```yaml
- name: ipv6-ula
action: ALLOW
@@ -147,9 +68,6 @@ Here is an example snippet that allows [IPv6 Unique Local Addresses](https://en.
- fc00::/7
```
-
-
-
## Extracting Anubis' embedded filesystem
You can always extract the list of rules embedded into the Anubis binary with this command:
diff --git a/docs/docs/admin/frameworks/htmx.mdx b/docs/docs/admin/frameworks/htmx.mdx
index 2b2ea49..93ae2f8 100644
--- a/docs/docs/admin/frameworks/htmx.mdx
+++ b/docs/docs/admin/frameworks/htmx.mdx
@@ -7,27 +7,6 @@ import TabItem from "@theme/TabItem";
To work around this, you can make a custom [expression](../configuration/expressions.mdx) rule that allows HTMX requests if the user has passed a challenge in the past:
-
-
-
-```json
-{
- "name": "allow-htmx-iff-already-passed-challenge",
- "action": "ALLOW",
- "expression": {
- "all": [
- "\"Cookie\" in headers",
- "headers[\"Cookie\"].contains(\"anubis-auth\")",
- "\"Hx-Request\" in headers",
- "headers[\"Hx-Request\"] == \"true\""
- ]
- }
-}
-```
-
-
-
-
```yaml
- name: allow-htmx-iff-already-passed-challenge
action: ALLOW
@@ -39,7 +18,4 @@ To work around this, you can make a custom [expression](../configuration/express
- 'headers["Hx-Request"] == "true"'
```
-
-
-
This will reduce some security because it does not assert the validity of the Anubis auth cookie, however in trade it improves the experience for existing users.
diff --git a/docs/docs/admin/policies.mdx b/docs/docs/admin/policies.mdx
index bda0372..3ad7aeb 100644
--- a/docs/docs/admin/policies.mdx
+++ b/docs/docs/admin/policies.mdx
@@ -7,6 +7,10 @@ import TabItem from "@theme/TabItem";
Out of the box, Anubis is pretty heavy-handed. It will aggressively challenge everything that might be a browser (usually indicated by having `Mozilla` in its user agent). However, some bots are smart enough to get past the challenge. Some things that look like bots may actually be fine (IE: RSS readers). Some resources need to be visible no matter what. Some resources and remotes are fine to begin with.
+Anubis lets you customize its configuration with a Policy File. This is a YAML document that spells out what actions Anubis should take when evaluating requests. The [default configuration](https://github.com/TecharoHQ/anubis/blob/main/data/botPolicies.yaml) explains everything, but this page contains an overview of everything you can do with it.
+
+## Bot Policies
+
Bot policies let you customize the rules that Anubis uses to allow, deny, or challenge incoming requests. Currently you can set policies by the following matches:
- Request path
@@ -18,75 +22,18 @@ As of version v1.17.0 or later, configuration can be written in either JSON or Y
Here's an example rule that denies [Amazonbot](https://developer.amazon.com/en/amazonbot):
-
-
-
-```json
-{
- "name": "amazonbot",
- "user_agent_regex": "Amazonbot",
- "action": "DENY"
-}
-```
-
-
-
-
```yaml
- name: amazonbot
user_agent_regex: Amazonbot
action: DENY
```
-
-
-
When this rule is evaluated, Anubis will check the `User-Agent` string of the request. If it contains `Amazonbot`, Anubis will send an error page to the user saying that access is denied, but in such a way that makes scrapers think they have correctly loaded the webpage.
Right now the only kinds of policies you can write are bot policies. Other forms of policies will be added in the future.
Here is a minimal policy file that will protect against most scraper bots:
-
-
-
-```json
-{
- "bots": [
- {
- "name": "cloudflare-workers",
- "headers_regex": {
- "CF-Worker": ".*"
- },
- "action": "DENY"
- },
- {
- "name": "well-known",
- "path_regex": "^/.well-known/.*$",
- "action": "ALLOW"
- },
- {
- "name": "favicon",
- "path_regex": "^/favicon.ico$",
- "action": "ALLOW"
- },
- {
- "name": "robots-txt",
- "path_regex": "^/robots.txt$",
- "action": "ALLOW"
- },
- {
- "name": "generic-browser",
- "user_agent_regex": "Mozilla",
- "action": "CHALLENGE"
- }
- ]
-}
-```
-
-
-
-
```yaml
bots:
- name: cloudflare-workers
@@ -107,22 +54,20 @@ bots:
action: CHALLENGE
```
-
-
-
This allows requests to [`/.well-known`](https://en.wikipedia.org/wiki/Well-known_URI), `/favicon.ico`, `/robots.txt`, and challenges any request that has the word `Mozilla` in its User-Agent string. The [default policy file](https://github.com/TecharoHQ/anubis/blob/main/data/botPolicies.json) is a bit more cohesive, but this should be more than enough for most users.
If no rules match the request, it is allowed through. For more details on this default behavior and its implications, see [Default allow behavior](./default-allow-behavior.mdx).
-## Writing your own rules
+### Writing your own rules
-There are three actions that can be returned from a rule:
+There are four actions that can be returned from a rule:
-| Action | Effects |
-| :---------- | :-------------------------------------------------------------------------------- |
-| `ALLOW` | Bypass all further checks and send the request to the backend. |
-| `DENY` | Deny the request and send back an error message that scrapers think is a success. |
-| `CHALLENGE` | Show a challenge page and/or validate that clients have passed a challenge. |
+| Action | Effects |
+| :---------- | :---------------------------------------------------------------------------------------------------------------------------------- |
+| `ALLOW` | Bypass all further checks and send the request to the backend. |
+| `DENY` | Deny the request and send back an error message that scrapers think is a success. |
+| `CHALLENGE` | Show a challenge page and/or validate that clients have passed a challenge. |
+| `WEIGH` | Change the [request weight](#request-weight) for this request. See the [request weight](#request-weight) docs for more information. |
Name your rules in lower case using kebab-case. Rule names will be exposed in Prometheus metrics.
@@ -130,27 +75,6 @@ Name your rules in lower case using kebab-case. Rule names will be exposed in Pr
Rules can also have their own challenge settings. These are customized using the `"challenge"` key. For example, here is a rule that makes challenges artificially hard for connections with the substring "bot" in their user agent:
-
-
-
-This rule has been known to have a high false positive rate in testing. Please use this with care.
-
-```json
-{
- "name": "generic-bot-catchall",
- "user_agent_regex": "(?i:bot|crawler)",
- "action": "CHALLENGE",
- "challenge": {
- "difficulty": 16,
- "report_as": 4,
- "algorithm": "slow"
- }
-}
-```
-
-
-
-
This rule has been known to have a high false positive rate in testing. Please use this with care.
```yaml
@@ -164,9 +88,6 @@ This rule has been known to have a high false positive rate in testing. Please u
algorithm: slow # intentionally waste CPU cycles and time
```
-
-
-
Challenges can be configured with these settings:
| Key | Example | Description |
@@ -181,21 +102,6 @@ The `remote_addresses` field of a Bot rule allows you to set the IP range that t
For example, you can allow a search engine to connect if and only if its IP address matches the ones they published:
-
-
-
-```json
-{
- "name": "qwantbot",
- "user_agent_regex": "\\+https\\:\\/\\/help\\.qwant\\.com/bot/",
- "action": "ALLOW",
- "remote_addresses": ["91.242.162.0/24"]
-}
-```
-
-
-
-
```yaml
- name: qwantbot
user_agent_regex: \+https\://help\.qwant\.com/bot/
@@ -204,25 +110,8 @@ For example, you can allow a search engine to connect if and only if its IP addr
remote_addresses: ["91.242.162.0/24"]
```
-
-
-
This also works at an IP range level without any other checks:
-
-
-
-```json
-{
- "name": "internal-network",
- "action": "ALLOW",
- "remote_addresses": ["100.64.0.0/10"]
-}
-```
-
-
-
-
```yaml
name: internal-network
action: ALLOW
@@ -230,9 +119,6 @@ remote_addresses:
- 100.64.0.0/10
```
-
-
-
## Imprint / Impressum support
Anubis has support for showing imprint / impressum information. This is defined in the `impressum` block of your configuration. See [Imprint / Impressum configuration](./configuration/impressum.mdx) for more information.
diff --git a/docs/docs/design/how-anubis-works.mdx b/docs/docs/design/how-anubis-works.mdx
index 2cc590d..e274320 100644
--- a/docs/docs/design/how-anubis-works.mdx
+++ b/docs/docs/design/how-anubis-works.mdx
@@ -102,18 +102,6 @@ When a client passes a challenge, Anubis sets an HTTP cookie named `"techaro.lol
This ensures that the token has enough metadata to prove that the token is valid (due to the token's signature), but also so that the server can independently prove the token is valid. This cookie is allowed to be set without triggering an EU cookie banner notification; but depending on facts and circumstances, you may wish to disclose this to your users.
-## Challenge format
-
-Challenges are formed by taking some user request metadata and using that to generate a SHA-256 checksum. The following request headers are used:
-
-- `Accept-Encoding`: The content encodings that the requestor supports, such as gzip.
-- `X-Real-Ip`: The IP address of the requestor, as set by a reverse proxy server.
-- `User-Agent`: The user agent string of the requestor.
-- The current time in UTC rounded to the nearest week.
-- The fingerprint (checksum) of Anubis' private ED25519 key.
-
-This forms a fingerprint of the requestor using metadata that any requestor already is sending. It also uses time as an input, which is known to both the server and requestor due to the nature of linear timelines. Depending on facts and circumstances, you may wish to disclose this to your users.
-
## JWT signing
Anubis uses an ed25519 keypair to sign the JWTs issued when challenges are passed. Anubis will generate a new ed25519 keypair every time it starts. At this time, there is no way to share this keypair between instance of Anubis, but that will be addressed in future versions.
diff --git a/yeetfile.js b/yeetfile.js
index 6992b1b..47749af 100644
--- a/yeetfile.js
+++ b/yeetfile.js
@@ -16,7 +16,6 @@ $`npm run assets`;
documentation: {
"./README.md": "README.md",
"./LICENSE": "LICENSE",
- "./data/botPolicies.json": "botPolicies.json",
"./data/botPolicies.yaml": "botPolicies.yaml",
},