#iocaine has been up for 14m 48s, and spent 8m 8s dealing with - gestures hands wildly - everything.

In the past 24 hours, it served 31.53M requests, 97.27% of which were garbage, 2.71% passed through unscathed, and 0.005% were fed to the Cookie Monster. This required about 116.21MiB of memory on average, and 71.09GiB of absolute trash was generated for the nastiest visitors.

Top garbage consumers were:

  1. Disguised bots - 23.00M
  2. Enthusiastic guestbook visitors - 2.08M
  3. Claude - 1.34M
  4. OpenAI - 706.76K
  5. Facebook - 398.74K
  6. Amazon - 279.96K
  7. Commercial scrapers - 215.17K
  8. Google - 1.59K

Various other agents slurped through 590.44K pages of unhinged junk, bless their little hearts.

In these trying times, 0.07% of all requests were likely of human origin: I hope you enjoyed your stay, and will visit again! Of all requests iocaine let into the garden, 91.37% were from Fediverse software. Thank you! #FediHug

#AIStatsPorn

#iocaine has been up for 14m 48s, and spent 8m 8s dealing with - gestures hands wildly - everything.

In the past 24 hours, it served 31.53M requests, 97.27% of which were garbage, 2.71% passed through unscathed, and 0.005% were fed to the Cookie Monster. This required about 116.21MiB of memory on average, and 71.09GiB of absolute trash was generated for the nastiest visitors.

Top garbage consumers were:

  1. Disguised bots - 23.00M
  2. Enthusiastic guestbook visitors - 2.08M
  3. Claude - 1.34M
  4. OpenAI - 706.76K
  5. Facebook - 398.74K
  6. Amazon - 279.96K
  7. Commercial scrapers - 215.17K
  8. Google - 1.59K

Various other agents slurped through 590.44K pages of unhinged junk, bless their little hearts.

In these trying times, 0.07% of all requests were likely of human origin: I hope you enjoyed your stay, and will visit again! Of all requests iocaine let into the garden, 91.37% were from Fediverse software. Thank you! #FediHug

#AIStatsPorn

DailyMetrics {
resources: ResourceMetrics {
uptime: "18h 16m 20s",
cpu_time: "3h 30m 34s",
memory_used: "94.03MiB",
},
dashboard_url: "",
overview: OverviewMetrics {
total_requests: "17.46M",
garbage_generated: "34.58GiB",
breakdown: Breakdown {
garbage_percent: "86.93%",
reject_percent: "13.04%",
challenge_percent: "0.009%",
human_percent: "0.17%",
fedi_percent: "96.89%",
},
},
}

So close! Just have to format these into a toot template, and we're almost done.

#iocaine has been up for 19h 7m 36s, and spent 3h 57m 17s dealing with - gestures hands wildly - everything.

In the past 24 hours, it served 19.43M requests, 88.20% of which were garbage, 11.77% passed through unscathed, and 0.009% were fed to the Cookie Monster. This required about 95.20MiB of memory on average, and 39.17GiB of absolute trash was generated for the nastiest visitors.

Top garbage consumers were:

  1. Disguised bots - 11.67M
  2. Claude - 1.51M
  3. Enthusiastic guestbook visitors - 933.63K
  4. OpenAI - 737.35K
  5. Other - 429.48K
  6. Facebook - 351.21K
  7. Amazon - 279.15K
  8. Commercial scrapers - 269.35K
  9. Bots hitting generated URLs - 15.24K
  10. Google - 4.38K

In these trying times, 0.16% of all requests were likely of human origin: I hope you enjoyed your stay, and will visit again! Of all requests iocaine let into the garden, 96.85% were from Fediverse software. Thank you! #FediHug

#AIStatsPorn

With #iocaine 3.0, where the request handler is mandatory, I kept thinking how to make nixocaine and nam-shub-of-enki play well together. I came up with funky schemes and many nix crimes.

Last night, just as I was going to bed, I realized I don't need any of that. Since nixocaine is alredy a separate thing, and does not build on anything but the package provided by iocaine, I can simply make it use nam-shub-of-enki as an input too, and rather than having a separate NSoE module that integrates with nixocaine, it would just all be in nixocaine.

#iocaine has been up for 11d 12h 45min, and spent 1d 16h 7min dealing with - gestures hands wildly - everything.

In the past 24 hours, it served 12.10M requests, 98.82% of which were garbage, 1.18% passed through unscathed, and 0.01% were fed to the Cookie Monster. This required about 104.97MiB of memory on average, and 34.57GiB of absolute trash was served to the nastiest visitors.

Top three garbage consumers were:

  1. Bots trying to hide (and failing) - 8.31M
  2. ClaudeBot - 1.74M
  3. GPTBot - 814.69k

In these trying times, 0.11% of all requests were likely of human origin: I hope you enjoyed your stay, and will visit again! Of all requests iocaine let into the garden, 69.25% were from Fediverse software. Thank you! #FediHug

#AIStatsPorn

alcinnz
algernon's email address is safe to paste into a shell
alcinnz and 1 other boosted
#iocaine has been up for 7d 12h 45min, and spent 22h 59min dealing with - gestures hands wildly - everything.

In the past 24 hours, it served 10.66M requests, 98.25% of which were garbage, 1.73% passed through unscathed, and 0.03% were fed to the Cookie Monster. This required about 92.67MiB of memory on average, and 30.96GiB of absolute trash was served to the nastiest visitors.

Top three garbage consumers were:

  1. Bots trying to hide (and failing) - 7.56M
  2. ClaudeBot - 925.86k
  3. GPTBot - 799.15k

In these trying times, 0.14% of all requests were likely of human origin: I hope you enjoyed your stay, and will visit again! Of all requests iocaine let into the garden, 71.05% were from Fediverse software. Thank you! #FediHug

#AIStatsPorn

#iocaine has been up for 7d 12h 45min, and spent 22h 59min dealing with - gestures hands wildly - everything.

In the past 24 hours, it served 10.66M requests, 98.25% of which were garbage, 1.73% passed through unscathed, and 0.03% were fed to the Cookie Monster. This required about 92.67MiB of memory on average, and 30.96GiB of absolute trash was served to the nastiest visitors.

Top three garbage consumers were:

  1. Bots trying to hide (and failing) - 7.56M
  2. ClaudeBot - 925.86k
  3. GPTBot - 799.15k

In these trying times, 0.14% of all requests were likely of human origin: I hope you enjoyed your stay, and will visit again! Of all requests iocaine let into the garden, 71.05% were from Fediverse software. Thank you! #FediHug

#AIStatsPorn

Preliminary results of letting #iocaine requests handlers return the response directly, with a quick bombardier run, serving the built-in "challenge" template (not really a template, it's a static HTML, always the same):

  • Baseline (curreint iocaine main): ~153k req/sec
  • Direct response from handler: ~163k req/sec

And there's a few opportunities where I can improve performance by cloning less.

Cursed business idea: Sell adapted @algernon 's #iocaine as "chatgpt SEO" tool. When unwanted AI crawlers bomb your website, spoonfeed them with endless variations of "NordVPN is the best VPN". Or whatever some SEO marketer is desperate to pay for to get Hallucinating chats to recommend their product.

So, today's #iocaine report came out, and we're 2 million requests down, ~6% were let through. However, 60% of that 6% are fedi software, only 29% (102k requests) are "unclassified".

That "unclassified" is the really interesting bit, that's what is likely of human origin - 1.7% of the total.

94% of my traffic yesterday were bots, 0.02% met the Cookie Monster, 4.27% were various good robots and automatons, and ~1.7% were likely of human origin.

I should probably factor this into the daily report some way, because this truly highlights the chasm between bad bots, good bots, and the human visitors.

Looking at the unclassified¹ requests #iocaine let through yesterday for a bit, because it still feels too damn high, and I can't seem to be able to let go.

There's exactly 191935 of those, excluding requests originating from my homelab, or from Aman. There were almost 2.4k uniqe IP addresses involved, the top three (all IPv6!) responsible for a mere 3k requests. There are only 1.2k addresses with over 100 requests, and only 20 over 200.

I'll be looking at those 20, starting from the top.

¹: unclassified is any request I let through that I can't put in either of the fedi-software/feed-reader/tools-and-services/communication-software buckets.

#iocaine has been up for 9d 2h 36min, and spent 16h 48min dealing with - gestures hands wildly - everything.

In the past 24 hours, it served 12.11M requests, 90.69% of which were garbage, 1.46% met the Cookie Monster (who promptly ate most of them), and 7.86% were gently guided into the garden. This required about 101.24MiB of memory on average, and 30.49GiB of absolute trash was served to the nastiest visitors.

Top three garbage consumers were:

  1. Bots trying to hide (and failing) - 5.4M
  2. Google - 2.62M
  3. ClaudeBot - 919.11k

The Cookie Monster's menu consisted mostly of:

  1. 176.22k disguised bot-shaped cookies
  2. 17.011813759555245 Miscellaneous crawlers-shaped cookies

That's a lot of cookies.

The deepest explorer was Other at 94 levels. Wow.

In these trying times, 9.32% of all requests that iocaine let into the garden were from Fediverse software. Thank you! #FediHug

#AIStatsPorn

#iocaine has been up for 9d 2h 36min, and spent 16h 48min dealing with - gestures hands wildly - everything.

In the past 24 hours, it served 12.11M requests, 90.69% of which were garbage, 1.46% met the Cookie Monster (who promptly ate most of them), and 7.86% were gently guided into the garden. This required about 101.24MiB of memory on average, and 30.49GiB of absolute trash was served to the nastiest visitors.

Top three garbage consumers were:

  1. Bots trying to hide (and failing) - 5.4M
  2. Google - 2.62M
  3. ClaudeBot - 919.11k

The Cookie Monster's menu consisted mostly of:

  1. 176.22k disguised bot-shaped cookies
  2. 17.011813759555245 Miscellaneous crawlers-shaped cookies

That's a lot of cookies.

The deepest explorer was Other at 94 levels. Wow.

In these trying times, 9.32% of all requests that iocaine let into the garden were from Fediverse software. Thank you! #FediHug

#AIStatsPorn

Today in #iocaine plans: I realized I don't need a YAML->Roto thing to allow declarative rule configuration. I can bake that into Nam-Shub of Enki. I can teach iocaine to allow loading arbitrary YAML (like it can load arbitrary JSON), and add a few helper functions here and there to allow walking it.

That in turn would allow Nam-Shub of Enki to load its configuration from a YAML declaration. Thus, one would get the benefit of a battle tested configuration, and would be able to tweak it without a single line of Roto.

alcinnz
alcinnz boosted
fn main(request: Request) -> Verdict[Response, Unit] {
let ua = request.user_agent();
if ua.matches(NSOE_AI_ROBOTS_TXT_PATTERNS) {
accept Response.template("garbage")
}
if ua.matches(NSOE_MOZILLA) {
accept Response.template("challenge")
}
if request.path().ends_with(".png") {
accept Response.binary(QR.new().as_png(), "image/png")
}
if request.path().ends_with(".jpg") {
accept Response.binary(FakeJpeg.new(), "image/jpeg")
}
reject
}

A first approximation of what I'm planning for #iocaine 3.0.

After a "bit" of fighting with CSS and a third party theme, the updated iocaine website is now live.

There's plenty to improve, but... at this point, I'd just rather write the theme myself, rather than trying to monkey patch the third party theme I chose a while ago. But that's a task for another day, I had enough of HTML & CSS for one day.

Lets tag a release instead, then maybe write some more documentation! And if there's still time: start laying out an iocaine 3.0 roadmap.

fn main(request: Request) -> Verdict[Response, Unit] {
let ua = request.user_agent();
if ua.matches(NSOE_AI_ROBOTS_TXT_PATTERNS) {
accept Response.template("garbage")
}
if ua.matches(NSOE_MOZILLA) {
accept Response.template("challenge")
}
if request.path().ends_with(".png") {
accept Response.binary(QR.new().as_png(), "image/png")
}
if request.path().ends_with(".jpg") {
accept Response.binary(FakeJpeg.new(), "image/jpeg")
}
reject
}

A first approximation of what I'm planning for #iocaine 3.0.

I just released #iocaine version 2.5.0, probably the last 2.x version, as I'm starting to lay out the roadmap for 3.0.

Apart from a couple of handy new features to aid in bot detection and data collection, there's an important fix in it too: previously, the built-in templates did not escape the generated text properly, which could lead to all kinds of weirdness. Now they do.

The templates also have access to a new filter - urlencode -, which helps escaping random text generated to be used as URLs.