The basic premise of parking_lot (which comes from WebKit; you can read all about it) is so cool to me. In short, if you need a million locks, you don't need a million 32-bit words in memory; at most NUM_THREADS of them will be contended at any time. Just maintain a concurrent hash table that tracks whatever each thread is up to. (Note: parking_lot still uses small 8-bit atomics per lock.) I'm miffed I didn't come up with it myself; but it does does seem to have limited applicability. I really enjoyed exploring the premise and looking for other places where it could come in handy.