Discussion
Loading...

Post

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Felix Palmen :freebsd: :c64:
@zirias@mastodon.bsd.cafe  路  activity timestamp 4 months ago

Solved! 馃コ

This was a pretty "interesting" bug. Remember when I invented a way to implement #async / #await in #C, for jobs running on a threadpool. Back then I said it only works when completion of the task resumes execution on the same pool thread.

Trying to improve overall performance, I found the complex logic to identify the thread job to put on a pool thread a real deal-breaker. Just having one single MPMC queue with a single semaphore for all pool threads to wait on is a lot more efficient. But then, a job continued after an awaited task will resume on a "random" thread.

It theoretically works by making sure to restore the CORRECT context (the original one of the pool thread) every time after executing a job, whether partially (up to the next await) or completely.

Only it didn't, at least here on #FreeBSD, and I finally understood the reason for this was that I was using #TLS (thread-local storage) to find the context to restore.

Well, most architectures store a pointer to the current thread metadata in a register. #POSIX user #context #switching saves and restores registers. I found a source claiming that the #Linux ( #glibc) implementation explicitly does NOT include the register holding a thread pointer. Obviously, #FreeBSD's implementation DOES include it. POSIX doesn't have to say anything about that.

In short, avoiding TLS accesses when running with a custom context solved the crash. 馃く

  • Copy link
  • Flag this post
  • Block
Felix Palmen :freebsd: :c64:
@zirias@mastodon.bsd.cafe replied  路  activity timestamp 4 months ago

I now added a #lockfree version of that MPMC job queue which is picked when the system headers claim that pointers are lockfree. Doesn't give any measurable performance gain 馃槥. Of course the #semaphore needs to stay, the pool threads need something to wait on. But I think the reason I can't get more than 3000 requests per second with my #jmeter stress test for #swad is that the machine's CPU is now completely busy 馃檲.

Need to look into actually saving CPU cycles for further optimizations I guess...

  • Copy link
  • Flag this comment
  • Block
Felix Palmen :freebsd: :c64:
@zirias@mastodon.bsd.cafe replied  路  activity timestamp 4 months ago

I now experimented with different ideas how to implement the #lockfree #queue for multiple producers and multiple consumers. Unsurprisingly, some ideas just didn't work. One deadlocked (okaaay ... so it wasn't lockfree) and I eventually gave up trying to understand why.

The "winner" so far is only "almost lockfree", but at least slightly improves performance. Throughput is the same as with the simple locked variant, but average response times are 10 to 20% quicker (although they deviate stronger for whatever reason). Well, that's committed for now:

https://github.com/Zirias/poser/commit/4f2f80a8266e4762e04ead2b802e7a7c1b55090b

#C11 #atomics

  • Copy link
  • Flag this comment
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About 路 Code of conduct 路 Privacy 路 Users 路 Instances
Bonfire social 路 1.0.0-rc.3.1 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login