mirror of
https://github.com/TecharoHQ/anubis.git
synced 2025-08-03 17:59:24 -04:00
docs: remove JSON examples from policy file docs
Signed-off-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
parent
d5f01dbdb9
commit
d6e8bc9b33
@ -1,29 +0,0 @@
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"import": "(data)/bots/_deny-pathological.yaml"
|
||||
},
|
||||
{
|
||||
"import": "(data)/meta/ai-block-aggressive.yaml"
|
||||
},
|
||||
{
|
||||
"import": "(data)/crawlers/_allow-good.yaml"
|
||||
},
|
||||
{
|
||||
"import": "(data)/bots/aggressive-brazilian-scrapers.yaml"
|
||||
},
|
||||
{
|
||||
"import": "(data)/common/keep-internet-working.yaml"
|
||||
},
|
||||
{
|
||||
"name": "generic-browser",
|
||||
"user_agent_regex": "Mozilla|Opera",
|
||||
"action": "CHALLENGE"
|
||||
}
|
||||
],
|
||||
"dnsbl": false,
|
||||
"status_codes": {
|
||||
"CHALLENGE": 200,
|
||||
"DENY": 200
|
||||
}
|
||||
}
|
@ -3,6 +3,6 @@ package data
|
||||
import "embed"
|
||||
|
||||
var (
|
||||
//go:embed botPolicies.yaml botPolicies.json all:apps all:bots all:clients all:common all:crawlers all:meta
|
||||
//go:embed botPolicies.yaml all:apps all:bots all:clients all:common all:crawlers all:meta
|
||||
BotPolicies embed.FS
|
||||
)
|
||||
|
@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
- Legacy JavaScript code has been eliminated.
|
||||
- The contact email in the LibreJS header has been changed.
|
||||
- The hard dependency on WebCrypto has been removed, allowing a proof of work challenge to work over plain (unencrypted) HTTP.
|
||||
- The legacy JSON based policy file example has been removed and all documentation for how to write a policy file in JSON has been deleted. JSON based policy files will still work, but YAML is the superior option for Anubis configuration.
|
||||
|
||||
### Breaking changes
|
||||
|
||||
|
@ -1,12 +0,0 @@
|
||||
---
|
||||
title: Proof-of-Work Algorithm Selection
|
||||
---
|
||||
|
||||
Anubis offers two proof-of-work algorithms:
|
||||
|
||||
- `"fast"`: highly optimized JavaScript that will run as fast as your computer lets it
|
||||
- `"slow"`: intentionally slow JavaScript that will waste time and memory
|
||||
|
||||
The fast algorithm is used by default to limit impacts on users' computers. Administrators may configure individual bot policy rules to use the slow algorithm in order to make known malicious clients waitloop and do nothing useful.
|
||||
|
||||
Generally, you should use the fast algorithm unless you have a good reason not to.
|
@ -7,25 +7,6 @@ Anubis has the ability to let you import snippets of configuration into the main
|
||||
|
||||
EG:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON">
|
||||
|
||||
```json
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"import": "(data)/bots/ai-catchall.yaml"
|
||||
},
|
||||
{
|
||||
"import": "(data)/bots/cloudflare-workers.yaml"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML" default>
|
||||
|
||||
```yaml
|
||||
bots:
|
||||
# Pathological bots to deny
|
||||
@ -34,30 +15,8 @@ bots:
|
||||
- import: (data)/bots/cloudflare-workers.yaml
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
Of note, a bot rule can either have inline bot configuration or import a bot config snippet. You cannot do both in a single bot rule.
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON">
|
||||
|
||||
```json
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"import": "(data)/bots/ai-catchall.yaml",
|
||||
"name": "generic-browser",
|
||||
"user_agent_regex": "Mozilla|Opera\n",
|
||||
"action": "CHALLENGE"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML" default>
|
||||
|
||||
```yaml
|
||||
bots:
|
||||
- import: (data)/bots/ai-catchall.yaml
|
||||
@ -67,9 +26,6 @@ bots:
|
||||
action: CHALLENGE
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
This will return an error like this:
|
||||
|
||||
```text
|
||||
@ -83,30 +39,11 @@ Paths can either be prefixed with `(data)` to import from the [the data folder i
|
||||
|
||||
You can also import from an imported file in case you want to import an entire folder of rules at once.
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON">
|
||||
|
||||
```json
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"import": "(data)/bots/_deny-pathological.yaml"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML" default>
|
||||
|
||||
```yaml
|
||||
bots:
|
||||
- import: (data)/bots/_deny-pathological.yaml
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
This lets you import an entire ruleset at once:
|
||||
|
||||
```yaml
|
||||
@ -124,22 +61,6 @@ Snippets can be written in either JSON or YAML, with a preference for YAML. When
|
||||
|
||||
Here is an example snippet that allows [IPv6 Unique Local Addresses](https://en.wikipedia.org/wiki/Unique_local_address) through Anubis:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON">
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"name": "ipv6-ula",
|
||||
"action": "ALLOW",
|
||||
"remote_addresses": ["fc00::/7"]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML" default>
|
||||
|
||||
```yaml
|
||||
- name: ipv6-ula
|
||||
action: ALLOW
|
||||
@ -147,9 +68,6 @@ Here is an example snippet that allows [IPv6 Unique Local Addresses](https://en.
|
||||
- fc00::/7
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## Extracting Anubis' embedded filesystem
|
||||
|
||||
You can always extract the list of rules embedded into the Anubis binary with this command:
|
||||
|
@ -7,27 +7,6 @@ import TabItem from "@theme/TabItem";
|
||||
|
||||
To work around this, you can make a custom [expression](../configuration/expressions.mdx) rule that allows HTMX requests if the user has passed a challenge in the past:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON">
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "allow-htmx-iff-already-passed-challenge",
|
||||
"action": "ALLOW",
|
||||
"expression": {
|
||||
"all": [
|
||||
"\"Cookie\" in headers",
|
||||
"headers[\"Cookie\"].contains(\"anubis-auth\")",
|
||||
"\"Hx-Request\" in headers",
|
||||
"headers[\"Hx-Request\"] == \"true\""
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML" default>
|
||||
|
||||
```yaml
|
||||
- name: allow-htmx-iff-already-passed-challenge
|
||||
action: ALLOW
|
||||
@ -39,7 +18,4 @@ To work around this, you can make a custom [expression](../configuration/express
|
||||
- 'headers["Hx-Request"] == "true"'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
This will reduce some security because it does not assert the validity of the Anubis auth cookie, however in trade it improves the experience for existing users.
|
||||
|
@ -7,6 +7,10 @@ import TabItem from "@theme/TabItem";
|
||||
|
||||
Out of the box, Anubis is pretty heavy-handed. It will aggressively challenge everything that might be a browser (usually indicated by having `Mozilla` in its user agent). However, some bots are smart enough to get past the challenge. Some things that look like bots may actually be fine (IE: RSS readers). Some resources need to be visible no matter what. Some resources and remotes are fine to begin with.
|
||||
|
||||
Anubis lets you customize its configuration with a Policy File. This is a YAML document that spells out what actions Anubis should take when evaluating requests. The [default configuration](https://github.com/TecharoHQ/anubis/blob/main/data/botPolicies.yaml) explains everything, but this page contains an overview of everything you can do with it.
|
||||
|
||||
## Bot Policies
|
||||
|
||||
Bot policies let you customize the rules that Anubis uses to allow, deny, or challenge incoming requests. Currently you can set policies by the following matches:
|
||||
|
||||
- Request path
|
||||
@ -18,75 +22,18 @@ As of version v1.17.0 or later, configuration can be written in either JSON or Y
|
||||
|
||||
Here's an example rule that denies [Amazonbot](https://developer.amazon.com/en/amazonbot):
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON" default>
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "amazonbot",
|
||||
"user_agent_regex": "Amazonbot",
|
||||
"action": "DENY"
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML">
|
||||
|
||||
```yaml
|
||||
- name: amazonbot
|
||||
user_agent_regex: Amazonbot
|
||||
action: DENY
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
When this rule is evaluated, Anubis will check the `User-Agent` string of the request. If it contains `Amazonbot`, Anubis will send an error page to the user saying that access is denied, but in such a way that makes scrapers think they have correctly loaded the webpage.
|
||||
|
||||
Right now the only kinds of policies you can write are bot policies. Other forms of policies will be added in the future.
|
||||
|
||||
Here is a minimal policy file that will protect against most scraper bots:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON" default>
|
||||
|
||||
```json
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"name": "cloudflare-workers",
|
||||
"headers_regex": {
|
||||
"CF-Worker": ".*"
|
||||
},
|
||||
"action": "DENY"
|
||||
},
|
||||
{
|
||||
"name": "well-known",
|
||||
"path_regex": "^/.well-known/.*$",
|
||||
"action": "ALLOW"
|
||||
},
|
||||
{
|
||||
"name": "favicon",
|
||||
"path_regex": "^/favicon.ico$",
|
||||
"action": "ALLOW"
|
||||
},
|
||||
{
|
||||
"name": "robots-txt",
|
||||
"path_regex": "^/robots.txt$",
|
||||
"action": "ALLOW"
|
||||
},
|
||||
{
|
||||
"name": "generic-browser",
|
||||
"user_agent_regex": "Mozilla",
|
||||
"action": "CHALLENGE"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML">
|
||||
|
||||
```yaml
|
||||
bots:
|
||||
- name: cloudflare-workers
|
||||
@ -107,22 +54,20 @@ bots:
|
||||
action: CHALLENGE
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
This allows requests to [`/.well-known`](https://en.wikipedia.org/wiki/Well-known_URI), `/favicon.ico`, `/robots.txt`, and challenges any request that has the word `Mozilla` in its User-Agent string. The [default policy file](https://github.com/TecharoHQ/anubis/blob/main/data/botPolicies.json) is a bit more cohesive, but this should be more than enough for most users.
|
||||
|
||||
If no rules match the request, it is allowed through. For more details on this default behavior and its implications, see [Default allow behavior](./default-allow-behavior.mdx).
|
||||
|
||||
## Writing your own rules
|
||||
### Writing your own rules
|
||||
|
||||
There are three actions that can be returned from a rule:
|
||||
There are four actions that can be returned from a rule:
|
||||
|
||||
| Action | Effects |
|
||||
| :---------- | :-------------------------------------------------------------------------------- |
|
||||
| `ALLOW` | Bypass all further checks and send the request to the backend. |
|
||||
| `DENY` | Deny the request and send back an error message that scrapers think is a success. |
|
||||
| `CHALLENGE` | Show a challenge page and/or validate that clients have passed a challenge. |
|
||||
| Action | Effects |
|
||||
| :---------- | :---------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `ALLOW` | Bypass all further checks and send the request to the backend. |
|
||||
| `DENY` | Deny the request and send back an error message that scrapers think is a success. |
|
||||
| `CHALLENGE` | Show a challenge page and/or validate that clients have passed a challenge. |
|
||||
| `WEIGH` | Change the [request weight](#request-weight) for this request. See the [request weight](#request-weight) docs for more information. |
|
||||
|
||||
Name your rules in lower case using kebab-case. Rule names will be exposed in Prometheus metrics.
|
||||
|
||||
@ -130,27 +75,6 @@ Name your rules in lower case using kebab-case. Rule names will be exposed in Pr
|
||||
|
||||
Rules can also have their own challenge settings. These are customized using the `"challenge"` key. For example, here is a rule that makes challenges artificially hard for connections with the substring "bot" in their user agent:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON" default>
|
||||
|
||||
This rule has been known to have a high false positive rate in testing. Please use this with care.
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "generic-bot-catchall",
|
||||
"user_agent_regex": "(?i:bot|crawler)",
|
||||
"action": "CHALLENGE",
|
||||
"challenge": {
|
||||
"difficulty": 16,
|
||||
"report_as": 4,
|
||||
"algorithm": "slow"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML">
|
||||
|
||||
This rule has been known to have a high false positive rate in testing. Please use this with care.
|
||||
|
||||
```yaml
|
||||
@ -164,9 +88,6 @@ This rule has been known to have a high false positive rate in testing. Please u
|
||||
algorithm: slow # intentionally waste CPU cycles and time
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
Challenges can be configured with these settings:
|
||||
|
||||
| Key | Example | Description |
|
||||
@ -181,21 +102,6 @@ The `remote_addresses` field of a Bot rule allows you to set the IP range that t
|
||||
|
||||
For example, you can allow a search engine to connect if and only if its IP address matches the ones they published:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON" default>
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "qwantbot",
|
||||
"user_agent_regex": "\\+https\\:\\/\\/help\\.qwant\\.com/bot/",
|
||||
"action": "ALLOW",
|
||||
"remote_addresses": ["91.242.162.0/24"]
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML">
|
||||
|
||||
```yaml
|
||||
- name: qwantbot
|
||||
user_agent_regex: \+https\://help\.qwant\.com/bot/
|
||||
@ -204,25 +110,8 @@ For example, you can allow a search engine to connect if and only if its IP addr
|
||||
remote_addresses: ["91.242.162.0/24"]
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
This also works at an IP range level without any other checks:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON" default>
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "internal-network",
|
||||
"action": "ALLOW",
|
||||
"remote_addresses": ["100.64.0.0/10"]
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML">
|
||||
|
||||
```yaml
|
||||
name: internal-network
|
||||
action: ALLOW
|
||||
@ -230,9 +119,6 @@ remote_addresses:
|
||||
- 100.64.0.0/10
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## Imprint / Impressum support
|
||||
|
||||
Anubis has support for showing imprint / impressum information. This is defined in the `impressum` block of your configuration. See [Imprint / Impressum configuration](./configuration/impressum.mdx) for more information.
|
||||
|
@ -102,18 +102,6 @@ When a client passes a challenge, Anubis sets an HTTP cookie named `"techaro.lol
|
||||
|
||||
This ensures that the token has enough metadata to prove that the token is valid (due to the token's signature), but also so that the server can independently prove the token is valid. This cookie is allowed to be set without triggering an EU cookie banner notification; but depending on facts and circumstances, you may wish to disclose this to your users.
|
||||
|
||||
## Challenge format
|
||||
|
||||
Challenges are formed by taking some user request metadata and using that to generate a SHA-256 checksum. The following request headers are used:
|
||||
|
||||
- `Accept-Encoding`: The content encodings that the requestor supports, such as gzip.
|
||||
- `X-Real-Ip`: The IP address of the requestor, as set by a reverse proxy server.
|
||||
- `User-Agent`: The user agent string of the requestor.
|
||||
- The current time in UTC rounded to the nearest week.
|
||||
- The fingerprint (checksum) of Anubis' private ED25519 key.
|
||||
|
||||
This forms a fingerprint of the requestor using metadata that any requestor already is sending. It also uses time as an input, which is known to both the server and requestor due to the nature of linear timelines. Depending on facts and circumstances, you may wish to disclose this to your users.
|
||||
|
||||
## JWT signing
|
||||
|
||||
Anubis uses an ed25519 keypair to sign the JWTs issued when challenges are passed. Anubis will generate a new ed25519 keypair every time it starts. At this time, there is no way to share this keypair between instance of Anubis, but that will be addressed in future versions.
|
||||
|
@ -16,7 +16,6 @@ $`npm run assets`;
|
||||
documentation: {
|
||||
"./README.md": "README.md",
|
||||
"./LICENSE": "LICENSE",
|
||||
"./data/botPolicies.json": "botPolicies.json",
|
||||
"./data/botPolicies.yaml": "botPolicies.yaml",
|
||||
},
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user