Finally getting somewhere working on the next evolution step for #swad. I have a first version that (normally 🙈) doesn't crash quickly (so, no release yet, but it's available on the master branch).

The good news: It's indeed an improvement to have multiple parallel #reactor (event-loop) threads. It now handles 3000 requests per second on the same hardware, with overall good response times and without any errors. I uploaded the results of the stress test here:

https://zirias.github.io/swad/stress/

The bad news ... well, there are multiple.

1. It got even more memory hungry. The new stress test still simulates 1000 distinct clients (trying to do more fails on my machine as #jmeter can't create new threads any more...), but with delays reduced to 1/3 and doing 100 iterations each. This now leaves it with a resident set of almost 270 MiB ... tuning #jemalloc on #FreeBSD to return memory more promptly reduces this to 187 MiB (which is still a lot) and reduces performance a bit (some requests run into 429, overall response times are worse). I have no idea yet where to start trying to improve this.

2. It requires tuning to manage that load without errors, mainly using more threads for the thread pool, although these threads stay almost idle ... which probably means I have to find ways to make putting work on and off these threads more efficient. At least I have some ideas.

3. I've seen a crash which only happened once so far, no idea as of now how to reproduce. sigh. Massively parallel code in C really is a PITA.

Seems the more I improve here, the more I find that should also be improved. 🤪

#C #coding #performance

I just fixed a "horrible" bug in #swad:

https://github.com/Zirias/poser/commit/fcd8f4eb44d9676dde2546042b5fe3165aecc52c

In case you don't understand C: This potentially dereferenced "wild" and null pointers before the (copy-and-pasted 🙈) typo was fixed, which means it's "undefined behavior", so might do surprising things, but more likely crash.

It affects the #epoll (on #Linux) and #eventports (on #Solaris / #illumos) backends. A quick smoke test on these platforms was done in swad 0.11 and didn't show any unexpected behavior. Only after preparing for the next release (that hopefully has multiple parallel event loops) by moving some static service data to thread-local storage, it suddenly failed on illumos, that's how I tracked down that embarrasing crap. 😞

I hope to complete a new version soon enough, so I don't have to do a "bugfix release" for it.

Just released: #swad 0.11 -- the session-less swad is done!

Swad is the "Simple Web Authentication Daemon", it adds cookie/form #authentication to your reverse #proxy, designed to work with #nginx' "auth_request". Several modules for checking credentials are included, one of which requires solving a crypto challenge like #Anubis does, to allow "bot-safe" guest logins. Swad is written in pure #C, compiles to a small (200-300kiB) binary, has minimal dependencies (zlib, OpenSSL/LibreSSL and optionally libpam) and should work on many #POSIX-alike systems (#FreeBSD tested a lot, #Linux and #illumos also tested)

This release is the first one not to require a server-side session (which consumes a significant amount of RAM on really busy sites), instead signed Json Web Tokens are now implemented. For now, they are signed using HMAC-SHA256 with a random key generated at startup. A future direction could be support for asymmetric keys (RSA, ED25519), which could open up new possibilities like having your reverse proxy pass the signed token to a backend application, which could then verify it, but still not forge it.

Read more, grab the latest .tar.xz, build and install it ... here: 😎

https://github.com/Zirias/swad

This is going nice so far, I can now correctly sign my #JWT (using #LibreSSL of course, so OpenSSL/LibreSSL will probably be an unconditional dependency for #swad in the next release)

Doing some first experiments, here's how a #JWT for #swad might look like, containing a custom property that has the "auth info" that's currently stored in the server-side #session ... 🤔

Now add a JOSE header, base64-encode and sign that beast...

Seems a first step is almost done, adding #JSON support to my #poser lib. This could be the foundation for #JWT support in #swad. 😎

Need to do more thorough testing I guess, but at least the two example documents from #rfc8259 work fine ... the test tool does a full #deserialization / #serialization roundtrip (with specific internal representations of the data types supported by JSON).

edit: Look at the "Longitude" value of the second object in the second example 😏 I only noticed myself right now, but of course that's the desired behavior.

First step towards implementing #JWT in #swad done, just committed a good 1000 LOC and now my #poser lib can do #JSON 😎
https://github.com/Zirias/poser/commit/7f1772e85c869d544f8a12099ed6545e163dc163

Hopefully, there will be another release of #swad soon!

Looking at my test results again, performance should be okay at least for moderately busy sites ... the 1000 requests per second I observed included actual logins, and I didn't even test whether it would also handle more (it probably would), the only issue was with resolving remote names (with that, around 30% of these requests failed because the thread pool was clogged with jobs all waiting for some DNS response), and the recommendation would be: just disable that feature if your site is a busy one.

But I'm really unhappy with RAM usage going up so much. Almost 100MiB resident set after seeing 1000 unique clients all attempting a login is a lot after all.

So, I'll try to move swad to a session-less design. It can't be fully stateless, a rate limiter will be needed, but maybe I can optimize a bit on that.

But the sessions could be replaced. They're currently used for two things:

* Store actual auth information. This could be stored in signed JWTs (json web tokens) on the client instead. I'm already starting to add JSON support to my poser lib 😉

* Store the random challenge for the #anubis-like proof-of-work checker. Could do the same as anubis here: Derive the challenge from request metadata instead, including a timestamp.

Will be quite some work, but could be doable.

Seems a first step is almost done, adding #JSON support to my #poser lib. This could be the foundation for #JWT support in #swad. 😎

Need to do more thorough testing I guess, but at least the two example documents from #rfc8259 work fine ... the test tool does a full #deserialization / #serialization roundtrip (with specific internal representations of the data types supported by JSON).

edit: Look at the "Longitude" value of the second object in the second example 😏 I only noticed myself right now, but of course that's the desired behavior.

Just released: #swad 0.10

https://github.com/Zirias/swad/releases/tag/v0.10

Swad is the "Simple Web Authentication Daemon". If you're looking for a way to add #authentication (and/or proof-of-work access as known from #anubis) to your #nginx reverse proxy -- without adding yet another reverse proxy -- swad could be for you! It's written in pure #C, has few external dependencies (just zlib, and optionally OpenSSL/Libressl and/or libpam) and compiles to a pretty small binary. It's designed for usage with nginx' 'auth_request'.

Swad is tested on #FreeBSD, some basic functionality tests were also done on #Linux and #illumos (descendant from #solaris). It should build and work on most #POSIX-alike systems.

This release mainly brings performance improvements and a few bugfixes. It's now stress-tested with Apache jmeter, verifying it can deal with at least 1000 requests per second on my personal (somewhat limited) FreeBSD host machine.

Hopefully, there will be another release of #swad soon!

Looking at my test results again, performance should be okay at least for moderately busy sites ... the 1000 requests per second I observed included actual logins, and I didn't even test whether it would also handle more (it probably would), the only issue was with resolving remote names (with that, around 30% of these requests failed because the thread pool was clogged with jobs all waiting for some DNS response), and the recommendation would be: just disable that feature if your site is a busy one.

But I'm really unhappy with RAM usage going up so much. Almost 100MiB resident set after seeing 1000 unique clients all attempting a login is a lot after all.

So, I'll try to move swad to a session-less design. It can't be fully stateless, a rate limiter will be needed, but maybe I can optimize a bit on that.

But the sessions could be replaced. They're currently used for two things:

* Store actual auth information. This could be stored in signed JWTs (json web tokens) on the client instead. I'm already starting to add JSON support to my poser lib 😉

* Store the random challenge for the #anubis-like proof-of-work checker. Could do the same as anubis here: Derive the challenge from request metadata instead, including a timestamp.

Will be quite some work, but could be doable.

Just released: #swad 0.10

https://github.com/Zirias/swad/releases/tag/v0.10

Swad is the "Simple Web Authentication Daemon". If you're looking for a way to add #authentication (and/or proof-of-work access as known from #anubis) to your #nginx reverse proxy -- without adding yet another reverse proxy -- swad could be for you! It's written in pure #C, has few external dependencies (just zlib, and optionally OpenSSL/Libressl and/or libpam) and compiles to a pretty small binary. It's designed for usage with nginx' 'auth_request'.

Swad is tested on #FreeBSD, some basic functionality tests were also done on #Linux and #illumos (descendant from #solaris). It should build and work on most #POSIX-alike systems.

This release mainly brings performance improvements and a few bugfixes. It's now stress-tested with Apache jmeter, verifying it can deal with at least 1000 requests per second on my personal (somewhat limited) FreeBSD host machine.

I now decided I'll at least aim for some middle grounds: Rework #swad so it only needs a (server-side) #session once a user is #authenticated!

This does have some implications, e.g. passing a redirect argument to the authentication endpoint won't work any more. But experimentation shows a workaround would be to use an "internal redirect" to the login endpoint in #nginx.

We'll see where I end up. Having sessions only for authenticated users should reduce the need for server-side RAM significantly, so I hope 😉

Got somewhere:

https://github.com/Zirias/swad/commit/1bbd1e90ff0623d972e8b71c881f590112a9668b

Now, no bot ever causes #swad to create a server-side session, at least from what I can observe in my logs -- these bots don't attempt any login!

I also disabled usage of CSRF tokens for the login form, which I forgot to mention in the commit message. They strictly require a session and are pointless on a login form anyways.

I now decided I'll at least aim for some middle grounds: Rework #swad so it only needs a (server-side) #session once a user is #authenticated!

This does have some implications, e.g. passing a redirect argument to the authentication endpoint won't work any more. But experimentation shows a workaround would be to use an "internal redirect" to the login endpoint in #nginx.

We'll see where I end up. Having sessions only for authenticated users should reduce the need for server-side RAM significantly, so I hope 😉

Good morning! ☕

Now that I can't find any other bugs in #swad any more, I'm thinking again about how I could improve it.

Would anyone consider deploying it on a busy site right now? Either as a replacement for #Anubis (proof-of-work against bots), or for simple non-federated #authentication, or maybe even both?

I'm currently not sure how well it would scale. The reason is the design with server-side sessions, which is simple and very light-weight "on the wire", but needs server-side RAM for each and every client. It's hard to guess how this would turn out on very busy sites.

So, I'm thinking about moving to a stateless design. The obvious technical choice for that would be to issue a signed #JWT (Json Web Token), just like Anubis does it as well. This would have a few consequences though:

* OpenSSL/LibreSSL would be a hard build dependency. Right now, it's only needed if the proof-of-work checker and/or TLS support is enabled.
* You'd need an X509 certificate in any case to operate swad, even without TLS, just for signing the JWTs.
* My current CSRF-protection would stop working (it's based on random tokens stored in the session). Probably not THAT bad, the login itself doesn't need it at all, and once logged in, the only action swad supports is logout, which then COULD be spoofed, but that's more an annoyance than a security threat... 🤔
* I would still need some server-side RAM for each and every client to implement the rate-limits for failed logins. At least, that's not as much RAM as currently.

Any thoughts? Should I work on going (almost) "stateless"?

Found and fixed two more bugs affecting only #TLS with #swad, so here's yet another "bugfix release":

https://github.com/Zirias/swad/releases/tag/v0.9

One of these bugs was always there and I never noticed (just ignoring intermediate certificates) because many clients cope well with this, but not all.

The other bug is yet another regression from earlier performance improvements. 😞

So, lots of releases these last days. I'll have to remember to do very thorough regression testing whenever "optimizing" things in existing code 🙈

In a nutshell: 0.8 was finally fine again without TLS, but if you need TLS, better use this new 0.9.

Adding what was missing for intermediate certificates, I had great fun with #OpenSSL#API again. I mean, it never gets old. First test gave me a nice crash of #swad. Because ....

Well, to use a certificate (type X509 *), you call SSL_CTX_use_certificate(). Docs say "On success the reference counter of the x is incremented." (where x means the certificate). Great, so, call X509_free() directly afterwards to ensure this certificate gets destroyed whenever the SSL context gets destroyed.

So, just call the same function again for the intermediate certificates? No ... but there's SSL_CTX_add_extra_chain_cert() which can be used multiple times. Nice, call it in a loop as long as I find additional certificates in the cert file, and X509_free() them all directly after adding.

And then observe the crash. Well, it's documented, the manpage for SSL_CTX_add_extra_chain_cert() tells:

"The x509 certificate provided to SSL_CTX_add_extra_chain_cert() will be freed by the library when the SSL_CTX is destroyed. An application should not free the x509 object."

So, clearly my fault not reading this before. Consistency in API design is so overrated. 🤪

Unfortunately, I had to do a bugfix release: #swad 0.8

Although I didn't observe any obvious misbehavior on my own installation for several days, I discovered two very relevant bugs just after release of 0.7 🤦‍♂️ -- one of them (only affecting #kqueue, for example on #FreeBSD) even critical because it could trigger "undefined behavior".

Both bugs are regressions from new (performance) improvements added, one from trying to queue as many writes as possible when sending HTTP responses, one from using kqueue to provide timers and signals.

See release notes for 0.8. Don't use 0.7. Sorry 🤪

https://github.com/Zirias/swad

Found and fixed two more bugs affecting only #TLS with #swad, so here's yet another "bugfix release":

https://github.com/Zirias/swad/releases/tag/v0.9

One of these bugs was always there and I never noticed (just ignoring intermediate certificates) because many clients cope well with this, but not all.

The other bug is yet another regression from earlier performance improvements. 😞

So, lots of releases these last days. I'll have to remember to do very thorough regression testing whenever "optimizing" things in existing code 🙈

In a nutshell: 0.8 was finally fine again without TLS, but if you need TLS, better use this new 0.9.

Unfortunately, I had to do a bugfix release: #swad 0.8

Although I didn't observe any obvious misbehavior on my own installation for several days, I discovered two very relevant bugs just after release of 0.7 🤦‍♂️ -- one of them (only affecting #kqueue, for example on #FreeBSD) even critical because it could trigger "undefined behavior".

Both bugs are regressions from new (performance) improvements added, one from trying to queue as many writes as possible when sending HTTP responses, one from using kqueue to provide timers and signals.

See release notes for 0.8. Don't use 0.7. Sorry 🤪

https://github.com/Zirias/swad

Now that #swad 0.7 is released, it's time to prepare a new release of #poser, my own lib supporting #services on #POSIX systems, following a #reactor with #threadpool design.

During development of swad, I moved poser from using strictly only POSIX APIs (with the scalability limits of e.g. #select) to auto-detected support for #kqueue, #epoll, #eventports, #signalfd and #timerfd (so now it could, in theory(!), "compete" with e.g. libevent). I also fixed quite some hidden bugs, and added more base functionality, like a #dictionary using nested hashtables internally, or #async tasks mimicking the async/await pattern known from e.g, #csharp. I also deprecated two features, the periodic and global "service tick" (superseded by individual timers) and the "resolve hosts" property of a "connection" (superseded by a separate resolve class).

I'll have to decide on a few things, e.g. whether I'll remove the deprecated stuff immediately and bump the major version of the "posercore" lib. I guess I'll do just that. I'd also like to add all the web-specific stuff (http 1.0/1.1 server) that's currently part of the swad code as a "poserweb" lib. This would get a major version of 0, indicating a generally unstable API/ABI as of now....

And then, I'd have to decide where certain utility classes belong to. The rate limiter is probably useful for things other than web, so it should probably go to core. What about url encoding/decoding, for example? 🤔

Stay tuned, something will come here, maybe helping you to write a nice service in plain #C 😎:

https://github.com/Zirias/poser

Just released: #swad 0.7! 😎

Swad is the "Simple Web Authentication Daemon". If you're looking for a solution to add cookie/form #authentication to your #nginx reverse proxy, or maybe even a #lightweight alternative to #Anubis which leaves the actual proxying to nginx, this might be for you! It is designed for use with nginx' auth_request, written in pure C, with minimal dependencies (zlib and, depending on build options, openssl/libressl and/or libpam), and compiles to a small binary (currently between 150kiB and less than 300kiB depending on compiler and target platform).

Swad should work on many #posix (and almost) systems. It's actually tested on #FreeBSD (in "production" use, but on a very low-traffic private site), and quick functionality tests also done on #Debian (#Linux) and #OpenIndiana (#Illumos, open-source #Solaris descendant).

As announced, this release doesn't bring any new features (in terms of WHAT it can do), but great improvements "under the hood", that should help performance at least on some platforms, see release notes for swad 0.7.

Read more, and download the .tar.xz (to build and install it 😆) here:
https://github.com/Zirias/swad

The next release of #swad will probably bring not a single new feature, but focus on improvements, especially regarding #performance. Support for using #kqueue (#FreeBSD et al) to handle #signals is a part of it (which is done and works). Still unsure whether I'll also add support for #Linux' #signalfd. Using kqueue also as a better backend for #timers is on the list.

Another hopefully quite relevant change is here:

https://github.com/Zirias/poser/commit/798f23547295f89fa0c751f0e707c3474b5c689c

In short, so far my #poser lib was always awaiting readiness notification (from kqueue, or #epoll on Linux, or select/poll for other platforms) before doing any read or write on a socket. This is the ideal approach for reads, because in the common case, a socket is NOT ready for reading ... our kernel must have received something from the remote end first. But for writes, it's not so ideal. The common case is that a socket IS ready to write (because there's space left in the kernel's send buffers). So, just try it, and only register for notifications if it ever fails, makes more sense. Avoids pointless waiting and pointless events, and e.g. with epoll, even unnecessary syscalls. 😉

I'm trying to add "genric" #signal handling to #poser. Ultimate goal is to provide a way for #swad to handle #SIGHUP, although signal handling must be done in poser's main event loop (signals are only ever unblocked while waiting for file descriptor events).

Okay, I could just add explicit handling for SIGHUP. But a generic solution would be nicer. Just for example, a consumer might be interested in #SIGINFO which doesn't even exist on all platforms ... 🤔

Now, #POSIX specs basically just say signal constants are "integer values". Not too helpful here. Is it safe to assume an upper bound for signal numbers on "real world" OS implementations, e.g. 64 like on #Linux? Should I check #NSIG and, if not defined, just define it to 64? 🙈

#C #coding #question