Discussion
Loading...

#Tag

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Felix Palmen :freebsd: :c64:
@zirias@mastodon.bsd.cafe  路  activity timestamp 4 months ago

More interesting progress trying to make #swad suitable for very busy sites!

I realized that #TLS (both with #OpenSSL and #LibreSSL) is a major bottleneck. With TLS enabled, I couldn't cross 3000 requests per second, with somewhat acceptable response times (most below 500ms). Disabling TLS, I could really see the impact of a #lockfree queue as opposed to one protected by a #mutex. With the mutex, up to around 8000 req/s could be reached on the same hardware. And with a lockfree design, that quickly went beyond 10k req/s, but crashed. 馃槅

So I read some scientific papers 馃檲 ... and redesigned a lot (). And now it finally seems to work. My latest test reached a throughput of almost 25k req/s, with response times below 10ms for most requests! I really didn't expect to see this happen. 馃ぉ Maybe it could do even more, didn't try yet.

Open issue: Can I do something about TLS? There must be some way to make it perform at least a bit better...

() edit: Here's the design I finally used, with a much simplified "dequeue" because the queues in question are guaranteed to have only a single consumer: https://dl.acm.org/doi/10.1145/248052.248106

Response times in percentiles. 97% stay below 10ms!
Response times in percentiles. 97% stay below 10ms!
Response times in percentiles. 97% stay below 10ms!
Throuput curve of my latest stress test of swad (with ramp-up and ramp-down phase)
Throuput curve of my latest stress test of swad (with ramp-up and ramp-down phase)
Throuput curve of my latest stress test of swad (with ramp-up and ramp-down phase)
  • Copy link
  • Flag this post
  • Block
Felix Palmen :freebsd: :c64:
@zirias@mastodon.bsd.cafe  路  activity timestamp 5 months ago

Nice, #threadpool overhaul done. Removed two locks ( #mutex) and two condition variables, replaced by a single lock and a single #semaphore. 馃槑 Simplifies the overall structure a lot, and it's probably safe to assume slightly better performance in contended situations as well. And so far, #valgrind's helgrind tool doesn't find anything to complain about. 馃檭

Looking at the screenshot, I should probably make #swad default to two threads per CPU and expose the setting in the configuration file. When some thread jobs are expected to block, having more threads than CPUs is probably better.

https://github.com/Zirias/poser/commit/995c27352615a65723fbd1833b2d36781cbeff4d

Process/thread details shown by top for a running swad instance, showing:

- One main thread waiting for events on kqueue
- 9 threads waiting on a semaphore, 8 of them are worker threads of a thread pool, one is a thread controlling the PAM child process
- One child process waiting on a pipe
Process/thread details shown by top for a running swad instance, showing: - One main thread waiting for events on kqueue - 9 threads waiting on a semaphore, 8 of them are worker threads of a thread pool, one is a thread controlling the PAM child process - One child process waiting on a pipe
Process/thread details shown by top for a running swad instance, showing: - One main thread waiting for events on kqueue - 9 threads waiting on a semaphore, 8 of them are worker threads of a thread pool, one is a thread controlling the PAM child process - One child process waiting on a pipe
  • Copy link
  • Flag this post
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About 路 Code of conduct 路 Privacy 路 Users 路 Instances
Bonfire social 路 1.0.0-rc.2.21 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login