Preliminary results of letting #iocaine requests handlers return the response directly, with a quick bombardier run, serving the built-in "challenge" template (not really a template, it's a static HTML, always the same):

  • Baseline (curreint iocaine main): ~153k req/sec
  • Direct response from handler: ~163k req/sec

And there's a few opportunities where I can improve performance by cloning less.

Cursed business idea: Sell adapted @algernon 's #iocaine as "chatgpt SEO" tool. When unwanted AI crawlers bomb your website, spoonfeed them with endless variations of "NordVPN is the best VPN". Or whatever some SEO marketer is desperate to pay for to get Hallucinating chats to recommend their product.

So, today's #iocaine report came out, and we're 2 million requests down, ~6% were let through. However, 60% of that 6% are fedi software, only 29% (102k requests) are "unclassified".

That "unclassified" is the really interesting bit, that's what is likely of human origin - 1.7% of the total.

94% of my traffic yesterday were bots, 0.02% met the Cookie Monster, 4.27% were various good robots and automatons, and ~1.7% were likely of human origin.

I should probably factor this into the daily report some way, because this truly highlights the chasm between bad bots, good bots, and the human visitors.

Looking at the unclassified¹ requests #iocaine let through yesterday for a bit, because it still feels too damn high, and I can't seem to be able to let go.

There's exactly 191935 of those, excluding requests originating from my homelab, or from Aman. There were almost 2.4k uniqe IP addresses involved, the top three (all IPv6!) responsible for a mere 3k requests. There are only 1.2k addresses with over 100 requests, and only 20 over 200.

I'll be looking at those 20, starting from the top.

¹: unclassified is any request I let through that I can't put in either of the fedi-software/feed-reader/tools-and-services/communication-software buckets.

#iocaine has been up for 9d 2h 36min, and spent 16h 48min dealing with - gestures hands wildly - everything.

In the past 24 hours, it served 12.11M requests, 90.69% of which were garbage, 1.46% met the Cookie Monster (who promptly ate most of them), and 7.86% were gently guided into the garden. This required about 101.24MiB of memory on average, and 30.49GiB of absolute trash was served to the nastiest visitors.

Top three garbage consumers were:

  1. Bots trying to hide (and failing) - 5.4M
  2. Google - 2.62M
  3. ClaudeBot - 919.11k

The Cookie Monster's menu consisted mostly of:

  1. 176.22k disguised bot-shaped cookies
  2. 17.011813759555245 Miscellaneous crawlers-shaped cookies

That's a lot of cookies.

The deepest explorer was Other at 94 levels. Wow.

In these trying times, 9.32% of all requests that iocaine let into the garden were from Fediverse software. Thank you! #FediHug

#AIStatsPorn

Today in #iocaine plans: I realized I don't need a YAML->Roto thing to allow declarative rule configuration. I can bake that into Nam-Shub of Enki. I can teach iocaine to allow loading arbitrary YAML (like it can load arbitrary JSON), and add a few helper functions here and there to allow walking it.

That in turn would allow Nam-Shub of Enki to load its configuration from a YAML declaration. Thus, one would get the benefit of a battle tested configuration, and would be able to tweak it without a single line of Roto.

alcinnz
alcinnz boosted
fn main(request: Request) -> Verdict[Response, Unit] {
let ua = request.user_agent();
if ua.matches(NSOE_AI_ROBOTS_TXT_PATTERNS) {
accept Response.template("garbage")
}
if ua.matches(NSOE_MOZILLA) {
accept Response.template("challenge")
}
if request.path().ends_with(".png") {
accept Response.binary(QR.new().as_png(), "image/png")
}
if request.path().ends_with(".jpg") {
accept Response.binary(FakeJpeg.new(), "image/jpeg")
}
reject
}

A first approximation of what I'm planning for #iocaine 3.0.

I just released #iocaine version 2.5.0, probably the last 2.x version, as I'm starting to lay out the roadmap for 3.0.

Apart from a couple of handy new features to aid in bot detection and data collection, there's an important fix in it too: previously, the built-in templates did not escape the generated text properly, which could lead to all kinds of weirdness. Now they do.

The templates also have access to a new filter - urlencode -, which helps escaping random text generated to be used as URLs.

On an unrelated note, the #iocaine 3.0 plans are massive. 2.2 with the scripting engine was huge. 3.0 is bigger, better.

Around 20 minutes later (with some assistance from Claude)

tab closed in a split second

Claude is hammering my sites with millions of requests a day, trying to steal whatever it can. It's costing me bandwidth, CPU, and other resources. It is doing the same thing to everyone, on a massive scale.

If you use Claude, or any of these LLMs, you, personally, support this carnage. You, personally, are responsible for the web becoming an increasingly hostile place.

@algernon I've seen #Iocaine and #Nephentes referred to as #TarPits.

Today in #iocaine development, I'm gonna continue working on nam-shub-of-enki, may patch a few helpers into iocaine in the process, and hopefully, by the end of the day, I'll get to write some docs.

While here, on this first toot of the day in the daily devlog thread: I plan to release iocaine 2.5.0 in a few days, over the weekend.

Small improvements this time, mostly request handler related:

  • More flexible logging (esp. for Roto, Lua/Fennel have built-in print, so things were easier over there)
  • Headers, query params, and cookies can all be serialized into a map, for logging & research purposes.
  • It's possible to extract captures from regexes now.
  • If you want to match multiple regexes with a single scan, that's also doable now (but can't extract captures from that, only match).

I will likely also add helpers to deal with User-Agent Client Hints headers, present in Chrome, Edge, and - as far as I can tell - most derivatives.

Ooof.

So, #iocaine has a dependency on tikv-jemalloc, and uses it by default. Unfortunately, that crate doesn't compile on FreeBSD (and is unnecessary there in the first place).

So I'm trying to figure out how to make it optional, but still default on Linux.

I can trivially make it fully optional, if I also make it non-default. But making it default on linux only is proving to be a bit tricky.

alcinnz
alcinnz boosted

I had an epiphany.

I added scriptability to #iocaine to make it easier to share configs. But what's sorely missing, is a default, a starting point. There's my Nam-Shub of Enki, but that's big, complex, and very, very aggressive. It is not suitable as a default.

It can be toned down, but doing so is non-trivial. So here's what I will do: I will go over it with a comb, and split some of it up into smaller pieces. That won't solve anything in itself, but what will, is that I'll add a configuration builder function, which will make it easier to mix and match the parts one wants.

And on top of that, I will write a small service, where you can click some checkboxes, select what you want, and you get a Nam-Shub of Enki configuration back, the pkg.roto entry point. Drop that and the module files somewhere, point iocaine at it, and you're good to go.

Every checkbox will explain, in detail, what it does, in a - hopefully! - easy to understand manner. It won't concern itself with technical details about how it accomplishes things, it will explain what will happen.

Like, if you enable the "Tell Google to fuck off" checkbox (working title!), it will do so, and will drop you off of Google Search too, most likely.

This will take a while, though, and there are other things I need to take care of first. But something like this is coming. The goal is to create something where anyone can assemble a configuration together, without touching a single line of Roto, or Lua, or even Fennel. No YAML soup, either.

Although, I will have to do some YAML-soup thingy too at some point, too. But that'll be the easier part: I've done YAML+CEL -> Roto before, doing the same, slightly differently is no big deal. The hard part there is coming up with the soup ingredients first.