Solved! 馃コ
This was a pretty "interesting" bug. Remember when I invented a way to implement #async / #await in #C, for jobs running on a threadpool. Back then I said it only works when completion of the task resumes execution on the same pool thread.
Trying to improve overall performance, I found the complex logic to identify the thread job to put on a pool thread a real deal-breaker. Just having one single MPMC queue with a single semaphore for all pool threads to wait on is a lot more efficient. But then, a job continued after an awaited task will resume on a "random" thread.
It theoretically works by making sure to restore the CORRECT context (the original one of the pool thread) every time after executing a job, whether partially (up to the next await) or completely.
Only it didn't, at least here on #FreeBSD, and I finally understood the reason for this was that I was using #TLS (thread-local storage) to find the context to restore.
Well, most architectures store a pointer to the current thread metadata in a register. #POSIX user #context #switching saves and restores registers. I found a source claiming that the #Linux ( #glibc) implementation explicitly does NOT include the register holding a thread pointer. Obviously, #FreeBSD's implementation DOES include it. POSIX doesn't have to say anything about that.
In short, avoiding TLS accesses when running with a custom context solved the crash. 馃く
I now added a #lockfree version of that MPMC job queue which is picked when the system headers claim that pointers are lockfree. Doesn't give any measurable performance gain 馃槥. Of course the #semaphore needs to stay, the pool threads need something to wait on. But I think the reason I can't get more than 3000 requests per second with my #jmeter stress test for #swad is that the machine's CPU is now completely busy 馃檲.
Need to look into actually saving CPU cycles for further optimizations I guess...