Possible fix for #877
In some cases, the parallel solution finder in Anubis could cause
all of the worker promises to leak due to the fact the promises
were being improperly terminated. A recursion bomb happens in the
following scenario:
1. A worker sends a message indicating it found a solution to the proof
of work challenge.
2. The `onmessage` handler for that worker calls `terminate()`
3. Inside `terminate()`, the parent process loops through all other
workers and calls `w.terminate()` on them.
4. It's possible that terminating a worker could lead to the `onerror`
event handler.
5. This would create a recursive loop of `onmessage` -> `terminate` ->
`onerror` -> `terminate` -> `onerror` and so on.
This infinite recursion quickly consumes all available stack space, but
this has never been noticed in development because all of my computers
have at least 64Gi of ram provisioned to them under the axiom paying for
more ram is cheaper than paying in my time spent having to work around
not having enough ram. Additionally, ia32 has a smaller base stack size,
which means that they will run into this issue much sooner than users on
other CPU architectures will.
The fix adds a boolean `settled` flag to prevent termination from
running more than once.
Signed-off-by: Xe Iaso <me@xeiaso.net>
Fixes#877
Continued from #879, event loop thrashing can cause stack space
exhaustion on ia32 systems. Previously this would thrash the event loop
in Firefox and Firefox derived browsers such as Pale Moon. I suspect
that this is the ultimate root cause of the bizarre irreproducible bugs
that Pale Moon (and maybe Cromite) users have been reporting since at
least #87 was merged.
The root cause is an invalid boolean statement:
```js
// send a progress update every 1024 iterations. since each thread checks
// separate values, one simple way to do this is by bit masking the
// nonce for multiples of 1024. unfortunately, if the number of threads
// is not prime, only some of the threads will be sending the status
// update and they will get behind the others. this is slightly more
// complicated but ensures an even distribution between threads.
if (
(nonce > oldNonce) | 1023 && // we've wrapped past 1024
(nonce >> 10) % threads === threadId // and it's our turn
) {
postMessage(nonce);
}
```
The logic here looks fine but is subtly wrong as was reported in #877
by a user in the Pale Moon community. Consider the following scenario:
`nonce` is a counter that increments by the worker count every loop.
This is intended to spread the load between CPU cores as such:
| Iteration | Worker ID | Nonce |
| :-------- | :-------- | :---- |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
| 2 | 0 | 2 |
| 3 | 1 | 3 |
And so on.
The incorrect part of this is the boolean logic, specifically the part
with the bitwise or `|`. I think the intent was to use a logical or
(`||`), but this had the effect of making the `postMessage` handler fire
on every iteration. The intent of this snippet (as the comment clearly
indicates) is to make sure that the main event loop is only updated with
the worker status every 1024 iterations per worker. This had the
opposite effect, causing a lot of messages to be sent from workers to
the parent JavaScript context.
This is bad for the event loop.
Instead, I have ripped out that statement and replaced it with a much
simpler increment only counter that fires every 1024 iterations.
Additionally, only the first thread communicates back to the parent
process. This does mean that in theory the other workers could be ahead
of the first thread (posting a message out of a worker has a nonzero
cost), but in practice I don't think this will be as much of an issue as
the current behaviour is.
The root cause of the stack exhaustion is likely the pressure caused by
all of the postMessage futures piling up. Maybe the larger stack size in
64 bit environments is causing this to be fine there, maybe it's some
combination of newer hardware in 64 bit systems making this not be as
much of a problem due to it being able to handle events fast enough to
keep up with the pressure.
Either way, thanks much to @wolfbeast and the Pale Moon community for
finding this. This will make Anubis faster for everyone!
This fixes a bug that was introduced in 68b653b0, in which the call
to metricsServer was passed a plain context.Background without
signal handling.
This commit adds back in the signal handling for the metrics server,
as well as for the Thoth client and storage backend.
Closes: #853
Signed-off-by: Emily Rowlands <emily@erowl.net>
* feat(anubis): add /healthz route to metrics server
Also add health check test for Docker Compose and update documentation
for health checking Anubis with Docker Compose.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
This is not used yet, but it will be part of a larger strategy around
adding/removing weight based on JA4H (and other) fingerprint matches
with Thoth.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib): fix race condition when rendering multiple challenge pages at once
Closes#832
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(web): make try again button work
Looks like the intent of this was "try the solution again". This fix
makes the client try the challenge again.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(web): don't block a user if they have an invalid challenge cookie
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: update CHANGELOG
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(known-instances): add rpmfusion.org to known instances
Signed-off-by: Lothar Serra Mari <mail@serra.me>
* docs(known-instances): add wiki.freepascal.org to known instances
Signed-off-by: Lothar Serra Mari <mail@serra.me>
---------
Signed-off-by: Lothar Serra Mari <mail@serra.me>
* correct gitea.botPolicies extension to be yaml, not json
while Anubis probably doesn't care about the extension, and would parse a JSON file just fine too, the rest of the page talks about `gitea.botPolicies.yaml`, so let's be consistent
Signed-off-by: Evgeni Golov <evgeni@golov.de>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Evgeni Golov <evgeni@golov.de>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* docs(known-instances): add clew.se to known instances
Signed-off-by: Lothar Serra Mari <mail@serra.me>
* docs(known-instances): add tumfatig.net to known instances
Signed-off-by: Lothar Serra Mari <mail@serra.me>
---------
Signed-off-by: Lothar Serra Mari <mail@serra.me>
* docs(known-instances): update list of known instances
Signed-off-by: Lothar Serra Mari <mail@serra.me>
* Update metadata
check-spelling run (pull_request) for probably-daily-new-spotted-instances-update
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
---------
Signed-off-by: Lothar Serra Mari <mail@serra.me>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Co-authored-by: Lothar Serra Mari <lotharsm@users.noreply.github.com>
* feat(lib/policy/expressions): add system load average to bot expression inputs
This lets Anubis dynamically react to system load in order to
increase and decrease the required level of scrutiny. High load? More
scrutiny required. Low load? Less scrutiny required.
* docs: spell system correctly
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Update metadata
check-spelling run (pull_request) for Xe/load-average
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* fix(default-config): don't enable low load average feature by default
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Signed-off-by: Xe Iaso <xe.iaso@techaro.lol>
* Add translation for Traditional Chinese
* Add translation for Traditional Chinese: test
* Add translation for Traditional Chinese: Add PR number to CHANGELOG
* Add translation for Traditional Chinese: test: remove empty lines
* Add translation for Traditional Chinese: test: remove empty lines
* docs(known-instances): add Duke University to known instances
Signed-off-by: Lothar Serra Mari <mail@serra.me>
* docs(known-instances): add fabulous.systems to known instances
Signed-off-by: Lothar Serra Mari <mail@serra.me>
* docs(known-instances): add coinhoards.org to known instances
Signed-off-by: Lothar Serra Mari <mail@serra.me>
* chore(spelling): exempt the known instances page
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Lothar Serra Mari <mail@serra.me>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* feat(decaymap): add Delete method
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(lib/challenge): refactor Validate to take ValidateInput
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib): implement store interface
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib/store): all metapackage to import all store implementations
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(policy): import all store backends
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib): use new challenge creation flow
Previously Anubis constructed challenge strings from request metadata.
This was a good idea in spirit, but has turned out to be a very bad idea
in practice. This new flow reuses the Store facility to dynamically
create challenge values with completely random data.
This is a fairly big rewrite of how Anubis processes challenges. Right
now it defaults to using the in-memory storage backend, but on-disk
(boltdb) and valkey-based adaptors will come soon.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(decaymap): fix documentation typo
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(lib): fix SA4004
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib/store): make generic storage interface test adaptor
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(decaymap): invert locking process for Delete
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib/store): add bbolt store implementation
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: go mod tidy
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(devcontainer): adapt to docker compose, add valkey service
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib): make challenges live for 30 minutes by default
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib/store): implement valkey backend
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib/store/valkey): disable tests if not using docker
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib/policy/config): ensure valkey stores can be loaded
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Update metadata
check-spelling run (pull_request) for Xe/store-interface
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* chore(devcontainer): remove port forwards because vs code handles that for you
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(default-config): add a nudge to the storage backends section of the docs
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(docs): listen on 0.0.0.0 for dev container support
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(policy): document storage backends
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: update CHANGELOG and internal links
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(admin/policies): don't start a sentence with as
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: fixes found in review
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>