are filesystems on linux just safer than the network subsystems or do filesystems just never expose any interface besides posix i/o so they have a much smaller and better-characterized attack surface?
Post
ok i do see io_uring has a mode to link dependent i/o operations. i'm pretty sure the "zero-copy" claim is deeply misleading however and i've just found this incredible quote
As soon as an sqe is consumed by the kernel, the application is free to reuse that sqe entry. This is true even for cases where the kernel isn't completely done with a given sqe yet.
sqe means "submission queue entry". "sqe entry" has not been defined.
If the kernel does need to access it after the entry has been consumed, it will have made a stable copy of it.
so the kernel is copying!!! he admit it!!!!!
and in fact copying data from user space means you can schedule independently of user space too!!! a ring buffer is a very specialized instrument for serialization of data. io_uring is not doing that! each individual sqe may have wildly different latencies and affect wildly variable subsystems.
furthermore, unlike getdents(), the ring "size" (number of elements) does not meaningfully limit the amount of data in flight. each operation may invoke additional internal buffering, because the kernel can't write the completed entry until it's copied to/from userspace.
oh my fucking god the kthread just goes to sleep after a while. this is why you don't implement an event loop in the kernel! this is why you use a goddamn synchronous syscall with a fixed-size buffer!!!
then it says "a userspace application has no way to know if the data it's going to fetch next is cached or not." completely backwards. a userspace application is the only one who can tell the kernel what data it's going to fetch next. the kernel is the one to manipulate the cache
oh and of course this is actually all missing the point that the page cache strongly couples "writing user data into the kernel" with "the data is now globally visible". i do not want this? please employ basic pipelining techniques?
who would you ever want the kernel to copy from the network directly into a userspace pointer which you just have to remember cannot be read from until you pull the appropriate completed queue entry. and its size is unknown except from the appropriate entry?
enough of this nonsense
Why this can happen isn't necessarily important, but it has an important side effect for the application.
strikes immense fear into my heart
the "synchronous syscall with buffer" approach allows you to colocate request and response in the same thread, and synchronous blocking allows the kernel to schedule other work while waiting
ok i think i just ideologically do not believe in the page cache. i spent years in the mines carefully pipelining my threads and buffers and it turns out the kernel forces everything through the page cache bottleneck
microkernels because the user can always manage the memory hierarchy better than you
ok now i'm free. except i wanna check out openbsd's memory management
oh neat thanks openbsd https://blog.pr4tt.com/2016/02/23/OpenBSD-Virtual-Memory/ didn't realize how the cpu managed the page table although that makes perfect sense. a virtual address from userspace is loaded from the translated real address either in RAM or cpu cache. the kernel manages memory mappings per-process via the TLB, which is also a cpu feature
hm so a "thread" is mostly memory context? but then it's less of an "OS thread" than a "cpu thread" imho