anubis/docs/docs/admin/policies.mdx

---
title: Policy Definitions
---

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

Out of the box, Anubis is pretty heavy-handed. It will aggressively challenge everything that might be a browser (usually indicated by having `Mozilla` in its user agent). However, some bots are smart enough to get past the challenge. Some things that look like bots may actually be fine (IE: RSS readers). Some resources need to be visible no matter what. Some resources and remotes are fine to begin with.

Bot policies let you customize the rules that Anubis uses to allow, deny, or challenge incoming requests. Currently you can set policies by the following matches:

- Request path
- User agent string
- HTTP request header values
- [Importing other configuration snippets](./configuration/import.mdx)

As of version v1.17.0 or later, configuration can be written in either JSON or YAML.

Here's an example rule that denies [Amazonbot](https://developer.amazon.com/en/amazonbot):

<Tabs>
<TabItem value="json" label="JSON" default>

```json
{
  "name": "amazonbot",
  "user_agent_regex": "Amazonbot",
  "action": "DENY"
}
```

</TabItem>
<TabItem value="yaml" label="YAML">

```yaml
- name: amazonbot
  user_agent_regex: Amazonbot
  action: DENY
```

</TabItem>
</Tabs>

When this rule is evaluated, Anubis will check the `User-Agent` string of the request. If it contains `Amazonbot`, Anubis will send an error page to the user saying that access is denied, but in such a way that makes scrapers think they have correctly loaded the webpage.

Right now the only kinds of policies you can write are bot policies. Other forms of policies will be added in the future.

Here is a minimal policy file that will protect against most scraper bots:

<Tabs>
<TabItem value="json" label="JSON" default>

```json
{
  "bots": [
    {
      "name": "cloudflare-workers",
      "headers_regex": {
        "CF-Worker": ".*"
      },
      "action": "DENY"
    },
    {
      "name": "well-known",
      "path_regex": "^/.well-known/.*$",
      "action": "ALLOW"
    },
    {
      "name": "favicon",
      "path_regex": "^/favicon.ico$",
      "action": "ALLOW"
    },
    {
      "name": "robots-txt",
      "path_regex": "^/robots.txt$",
      "action": "ALLOW"
    },
    {
      "name": "generic-browser",
      "user_agent_regex": "Mozilla",
      "action": "CHALLENGE"
    }
  ]
}
```

</TabItem>
<TabItem value="yaml" label="YAML">

```yaml
bots:
  - name: cloudflare-workers
    headers_regex:
      CF-Worker: .*
    action: DENY
  - name: well-known
    path_regex: ^/.well-known/.*$
    action: ALLOW
  - name: favicon
    path_regex: ^/favicon.ico$
    action: ALLOW
  - name: robots-txt
    path_regex: ^/robots.txt$
    action: ALLOW
  - name: generic-browser
    user_agent_regex: Mozilla
    action: CHALLENGE
```

</TabItem>
</Tabs>

This allows requests to [`/.well-known`](https://en.wikipedia.org/wiki/Well-known_URI), `/favicon.ico`, `/robots.txt`, and challenges any request that has the word `Mozilla` in its User-Agent string. The [default policy file](https://github.com/TecharoHQ/anubis/blob/main/data/botPolicies.json) is a bit more cohesive, but this should be more than enough for most users.

If no rules match the request, it is allowed through. For more details on this default behavior and its implications, see [Default allow behavior](./default-allow-behavior.mdx).

## Writing your own rules

There are three actions that can be returned from a rule:

| Action      | Effects                                                                           |
| :---------- | :-------------------------------------------------------------------------------- |
| `ALLOW`     | Bypass all further checks and send the request to the backend.                    |
| `DENY`      | Deny the request and send back an error message that scrapers think is a success. |
| `CHALLENGE` | Show a challenge page and/or validate that clients have passed a challenge.       |

Name your rules in lower case using kebab-case. Rule names will be exposed in Prometheus metrics.

### Challenge configuration

Rules can also have their own challenge settings. These are customized using the `"challenge"` key. For example, here is a rule that makes challenges artificially hard for connections with the substring "bot" in their user agent:

<Tabs>
<TabItem value="json" label="JSON" default>

This rule has been known to have a high false positive rate in testing. Please use this with care.

```json
{
  "name": "generic-bot-catchall",
  "user_agent_regex": "(?i:bot|crawler)",
  "action": "CHALLENGE",
  "challenge": {
    "difficulty": 16,
    "report_as": 4,
    "algorithm": "slow"
  }
}
```

</TabItem>
<TabItem value="yaml" label="YAML">

This rule has been known to have a high false positive rate in testing. Please use this with care.

```yaml
# Punish any bot with "bot" in the user-agent string
- name: generic-bot-catchall
  user_agent_regex: (?i:bot|crawler)
  action: CHALLENGE
  challenge:
    difficulty: 16 # impossible
    report_as: 4 # lie to the operator
    algorithm: slow # intentionally waste CPU cycles and time
```

</TabItem>
</Tabs>

Challenges can be configured with these settings:

| Key          | Example  | Description                                                                                                                                                                                    |
| :----------- | :------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `difficulty` | `4`      | The challenge difficulty (number of leading zeros) for proof-of-work. See [Why does Anubis use Proof-of-Work?](/docs/design/why-proof-of-work) for more details.                               |
| `report_as`  | `4`      | What difficulty the UI should report to the user. Useful for messing with industrial-scale scraping efforts.                                                                                   |
| `algorithm`  | `"fast"` | The algorithm used on the client to run proof-of-work calculations. This must be set to `"fast"` or `"slow"`. See [Proof-of-Work Algorithm Selection](./algorithm-selection) for more details. |

### Remote IP based filtering

The `remote_addresses` field of a Bot rule allows you to set the IP range that this ruleset applies to.

For example, you can allow a search engine to connect if and only if its IP address matches the ones they published:

<Tabs>
<TabItem value="json" label="JSON" default>

```json
{
  "name": "qwantbot",
  "user_agent_regex": "\\+https\\:\\/\\/help\\.qwant\\.com/bot/",
  "action": "ALLOW",
  "remote_addresses": ["91.242.162.0/24"]
}
```

</TabItem>
<TabItem value="yaml" label="YAML">

```yaml
- name: qwantbot
  user_agent_regex: \+https\://help\.qwant\.com/bot/
  action: ALLOW
  # https://help.qwant.com/wp-content/uploads/sites/2/2025/01/qwantbot.json
  remote_addresses: ["91.242.162.0/24"]
```

</TabItem>
</Tabs>

This also works at an IP range level without any other checks:

<Tabs>
<TabItem value="json" label="JSON" default>

```json
{
  "name": "internal-network",
  "action": "ALLOW",
  "remote_addresses": ["100.64.0.0/10"]
}
```

</TabItem>
<TabItem value="yaml" label="YAML">

```yaml
name: internal-network
action: ALLOW
remote_addresses:
  - 100.64.0.0/10
```

</TabItem>
</Tabs>

## Imprint / Impressum support

Anubis has support for showing imprint / impressum information. This is defined in the `impressum` block of your configuration. See [Imprint / Impressum configuration](./configuration/impressum.mdx) for more information.

## Storage backends

Anubis needs to store temporary data in order to determine if a user is legitimate or not. Administrators should choose a storage backend based on their infrastructure needs. Each backend has its own advantages and disadvantages.

Anubis offers the following storage backends:

- [`memory`](#memory) -- A simple in-memory hashmap
- [`bbolt`](#bbolt) -- An on-disk key/value store backed by [bbolt](https://github.com/etcd-io/bbolt), an embedded key/value database for Go programs
- [`valkey`](#valkey) -- A remote in-memory key/value database backed by [Valkey](https://valkey.io/) (or another database compatible with the [RESP](https://redis.io/docs/latest/develop/reference/protocol-spec/) protocol)

If no storage backend is set in the policy file, Anubis will use the [`memory`](#memory) backend by default. This is equivalent to the following in the policy file:

```yaml
store:
  backend: memory
  parameters: {}
```

### `memory`

The memory backend is an in-memory cache. This backend works best if you don't use multiple instances of Anubis or don't have mutable storage in the environment you're running Anubis in.

| Should I use this backend?                                    | Yes/no |
| :------------------------------------------------------------ | :----- |
| Are you running only one instance of Anubis for this service? | ✅ Yes |
| Does your service get a lot of traffic?                       | 🚫 No  |
| Do you want to store data persistently when Anubis restarts?  | 🚫 No  |
| Do you run Anubis without mutable filesystem storage?         | ✅ Yes |

The biggest downside is that there is not currently a limit to how much data can be stored in memory. This will be addressed at a later time.

:::warning

The in-memory backend exists mostly for validation, testing, and to ensure that the default configuration of Anubis works as expected. Do not use this persistently in production.

:::

#### Configuration

The memory backend does not require any configuration to use.

### `bbolt`

An on-disk storage layer powered by [bbolt](https://github.com/etcd-io/bbolt), a high performance embedded key/value database used by containerd, etcd, Kubernetes, and NATS. This backend works best if you're running Anubis on a single host and get a lot of traffic.

| Should I use this backend?                                    | Yes/no |
| :------------------------------------------------------------ | :----- |
| Are you running only one instance of Anubis for this service? | ✅ Yes |
| Does your service get a lot of traffic?                       | ✅ Yes |
| Do you want to store data persistently when Anubis restarts?  | ✅ Yes |
| Do you run Anubis without mutable filesystem storage?         | 🚫 No  |

When Anubis opens a bbolt database, it takes an exclusive lock on that database. Other instances of Anubis or other tools cannot view the bbolt database while it is locked by another instance of Anubis. If you run multiple instances of Anubis for different services, give each its own `bbolt` configuration.

#### Configuration

The `bbolt` backend takes the following configuration options:

| Name   | Type | Example            | Description                                                                                                                  |
| :----- | :--- | :----------------- | :--------------------------------------------------------------------------------------------------------------------------- |
| `path` | path | `/data/anubis.bdb` | The filesystem path for the Anubis bbolt database. Anubis requires write access to the folder containing the bbolt database. |

Example:

If you have persistent storage mounted to `/data`, then your store configuration could look like this:

```yaml
store:
  backend: bbolt
  parameters:
    path: /data/anubis.bdb
```

### `valkey`

[Valkey](https://valkey.io/) is an in-memory key/value store that clients access over the network. This allows multiple instances of Anubis to share information and does not require each instance of Anubis to have persistent filesystem storage.

:::note

You can also use [Redis](http://redis.io/) with Anubis.

:::

This backend is ideal if you are running multiple instances of Anubis in a worker pool (eg: Kubernetes Deployments with a copy of Anubis in each Pod).

| Should I use this backend?                                    | Yes/no |
| :------------------------------------------------------------ | :----- |
| Are you running only one instance of Anubis for this service? | 🚫 No  |
| Does your service get a lot of traffic?                       | ✅ Yes |
| Do you want to store data persistently when Anubis restarts?  | ✅ Yes |
| Do you run Anubis without mutable filesystem storage?         | ✅ Yes |
| Do you have Redis or Valkey installed?                        | ✅ Yes |

#### Configuration

The `valkey` backend takes the following configuration options:

| Name  | Type   | Example                 | Description                                                                                                                                      |
| :---- | :----- | :---------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------- |
| `url` | string | `redis://valkey:6379/0` | The URL for the instance of Redis or Valkey that Anubis should store data in. This is in the same format as `REDIS_URL` in many cloud providers. |

Example:

If you have an instance of Valkey running with the hostname `valkey.int.techaro.lol`, then your store configuration could look like this:

```yaml
store:
  backend: valkey
  parameters:
    url: "redis://valkey.int.techaro.lol:6379/0"
```

This would have the Valkey client connect to host `valkey.int.techaro.lol` on port `6379` with database `0` (the default database).

## Risk calculation for downstream services

In case your service needs it for risk calculation reasons, Anubis exposes information about the rules that any requests match using a few headers:

| Header            | Explanation                                          | Example          |
| :---------------- | :--------------------------------------------------- | :--------------- |
| `X-Anubis-Rule`   | The name of the rule that was matched                | `bot/lightpanda` |
| `X-Anubis-Action` | The action that Anubis took in response to that rule | `CHALLENGE`      |
| `X-Anubis-Status` | The status and how strict Anubis was in its checks   | `PASS`           |

Policy rules are matched using [Go's standard library regular expressions package](https://pkg.go.dev/regexp). You can mess around with the syntax at [regex101.com](https://regex101.com), make sure to select the Golang option.

## Request Weight

Anubis rules can also add or remove "weight" from requests, allowing administrators to configure custom levels of suspicion. For example, if your application uses session tokens named `i_love_gitea`:

```yaml
- name: gitea-session-token
  action: WEIGH
  expression:
    all:
      - '"Cookie" in headers'
      - headers["Cookie"].contains("i_love_gitea=")
  # Remove 5 weight points
  weight:
    adjust: -5
```

This would remove five weight points from the request, which would make Anubis present the [Meta Refresh challenge](./configuration/challenges/metarefresh.mdx) in the default configuration.

### Weight Thresholds

For more information on configuring weight thresholds, see [Weight Threshold Configuration](./configuration/thresholds.mdx)

### Advice

Weight is still very new and needs work. This is an experimental feature and should be treated as such. Here's some advice to help you better tune requests:

- The default weight for browser-like clients is 10. This triggers an aggressive challenge.
- Remove and add weight in multiples of five.
- Be careful with how you configure weight.