This may seem strange, but allowlisting common crawl means that scrapers
have less incentive to scrape because they can just grab the data from
common crawl instead of scraping it again.
I'm gonna be totally honest here, I'm still not sure why #564 is still
an issue. This is really confusing and I'm going to totally throw out
how Anubis issues challenges and redo it with Valkey (#201, #622).
The problem seems to be that I assume that the makeChallenge function in
package lib is idempotent for the same client. I have no idea why this
would be inconsistent, but for some reason it is and I'm just at a loss
for words as to why this is happening.
This stops the bleeding by improving the UX as a stopgap.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Add cookie prefix option
* Add explaination comment for TestCookieName
* Rename TestCookieName value from cookie-test-if-you-block-this-anubis-wont-work to cookie-verification
* Add changes to CHANGELOG.md
* Add values to CookieName and TestCookieName in anubis.go required for testcases
* Fix cookieDynamicDomain option not being set in Options struct
* Fix using wrong cookie name when using dynamic cookie domains
* Adjust testcases for new cookie option structs
* Add known words to expect.txt and change typo in Zombocom
* Cleanup expect.txt
* Add changes to changelog
* Bump versions of grpc and apimachinery
* Fix testcases and add additional condition for dynamic cookie domain
* lib/localization: implement localization system
Locale files are placed in lib/localization/locales/. If you add a
locale, update manifest.json with available locales.
* Exclude locales from check spelling
* tests(lib/localization): add comprehensive translations test
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(challenge/metarefresh): enable localization
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix: use simple syntax for localization in templ
Also localize CELPHASE into French according to the wishes of the
artist.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore:(js): fix forbidden patterns
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: add goi18n to tools
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib/localization): dynamically determine the list of supported languages
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* feat: dynamic cookie domains
Replaces #685
I was having weird testing issues when trying to merge #685, so I
rewrote it from scratch to be a lot more minimal.
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(xess): remove unused xess templates
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* chore(checker): remove unused staticHashChecker implementation
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* feat: add pinact and deadcode to go tools (pinact is used for the gha pinning)
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* chore: update Docker and kubectl actions to latest versions
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* chore: update Homebrew action from master to main in workflow files
See df537ec97f
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* chore: remove unused go-colorable and tools dependencies from go.sum
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* chore: update postcss-import and other dependencies to latest versions
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* chore: update Docusaurus dependencies to version 3.8.1
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* chore: downgrade playwright and playwright-core to version 1.52.0
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
---------
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Closes#564
This one is really dumb. Take a seat and listen to my tale of woe.
While @victorvalenca was working on #693 we ran into a strange issue.
The tests would consistently pass on Firefox but instantly failed on
Chrome. After adding increasingly desperate debugging logs to the mix,
we found out that somehow Chrome was randomizing the contents of its
Accept-Language header. This was making the challenge string get
calculated differently, thus making things spuriously fail. I cannot
figure out what causes Chrome to do this other than you being in an
environment where you have more than one "system language" set.
Either way, this should finally fix this issue and bring peace to the
land forever*.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(config): opengraph passthrough configuration
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(ogtags): use config.OpenGraph for configuration
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: wire up ogtags config in most of the app
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(ogtags): return default tags if they are supplied
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: make OpenGraph legal so we have some sanity in reviewing
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib): use OpenGraph.Enabled
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib): load default config file if one is not specified in spawnAnubis
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(config): fix ST1005
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: document open graph defaults and its new home in the policy file
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(installation): point to weight threshold new home
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: rename default to override
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(default-config): add off-by-default opengraph settings to bot policy file
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(anubis): make build
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib): fix build
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat: replace cidranger with bart improving performance by 3-20x
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* perf: replace cidranger with bart for IP range checking
- Replace cidranger.Ranger with bart.Lite in RemoteAddrChecker
- Use netip.ParsePrefix instead of net.ParseCIDR for modern IP handling
- Improve performance: 3-20x faster lookups with zero heap allocations
- Update imports to use github.com/gaissmai/bart and net/netip
- Remove cidranger dependency from go.mod
Benchmark results:
- IPv4 lookups: 4x faster (15.58ns vs 63.25ns, 0 vs 2 allocs)
- IPv6 lookups: 3x faster (26.51ns vs 76.96ns, 0 vs 2 allocs)
- Insertions: 20x faster (976ns vs 19,191ns)
- Large tables: 14x faster (5.2ns vs 74.85ns)
* docs: clarify CHANGELOG to not give false impressions
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* perf: optimize string concatenation in RemoteAddrChecker hash generation
Replace fmt.Fprintln with strings.Join for 7x faster performance:
- Before: 935.1 ns/op, 784 B/op, 22 allocs/op
- After: 133.2 ns/op, 192 B/op, 1 alloc/op
The hash is used for JWT cookie validation and error code generation.
Comma separation provides the same deterministic uniqueness as newlines
but with significantly better performance during policy initialization.
* chore: remove accidentally commited string benchmark
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* style: apply Copilot suggestions
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* fix: reference the right var name
i cannot write a merge commit
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
---------
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
* Revert "docs/blog: remove (#273)"
This reverts commit df3509ec998d002e8bcef6285c7c0bc16b54d513.
* chore: intro to the blog post
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(known-instances): add bugs.scummvm.org and gitlab.postmarketos.org
Signed-off-by: Lothar Serra Mari <mail@serra.me>
* chore: clean uri
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
---------
Signed-off-by: Lothar Serra Mari <mail@serra.me>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Jason Cameron <git@jasoncameron.dev>
* refactor(ogtags): optimize URL construction and memory allocations
* test(ogtags): add benchmarks and memory usage tests for OGTagCache
* refactor(ogtags): optimize OGTags subsystem to reduce allocations and improve request runtime by up to 66%
* Update docs/docs/CHANGELOG.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
* refactor(ogtags): optimize URL string construction to reduce allocations
* Update internal/ogtags/ogtags.go
Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
* test(ogtags): add fuzz tests for getTarget and extractOGTags functions
* fix(ogtags): update memory calculation logic
Prev it would say that we had allocated 18pb
=== RUN TestMemoryUsage
mem_test.go:107: Memory allocated for 10k getTarget calls: 18014398509481904.00 KB
mem_test.go:135: Memory allocated for 1k extractOGTags calls: 18014398509481978.00
Now it's fixed with
=== RUN TestMemoryUsage
mem_test.go:109: Memory allocated for 10k getTarget calls:
mem_test.go:110: Total: 630.56 KB (0.62 MB)
mem_test.go:111: Per operation: 64.57 bytes
mem_test.go:140: Memory allocated for 1k extractOGTags calls:
mem_test.go:141: Total: 328.17 KB (0.32 MB)
mem_test.go:142: Per operation: 336.05 bytes
* refactor(ogtags): optimize meta tag extraction for improved performance
* Update metadata
check-spelling run (pull_request) for json/ogmem
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* chore: update CHANGELOG for recent optimizations and version bump
* refactor: improve URL construction and meta tag extraction logic
* style: cleanup fuzz tests
---------
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* style: fix formatting in .air.toml and installation.mdx
* feat: add --strip-base-prefix flag to modify request paths when forwarding
Closes: #638
* refactor: apply structpacking (betteralign)
* fix: add validation for strip-base-prefix and base-prefix configuration
* fix: improve request path handling by cloning request and modifying URL path
* chore: remove integration tests as they are too annoying to debug on my system