lmao that copy.fail was in linux/crypto/
Discussion
it's not a magic bullet but rather a bundle of strategies to enable deep trust over the code that runs in ring 0. an additional result is in lowering the barrier to contributions. i suspect receptivity to forks would be a good thing, but also if the project needs to be forked, that could be a sign it's not microkerneling hard enough
grapheneos is much more ambitious and it remains secure partially by limiting hardware support, which further enables it to reduce the amount of trusted code. it also integrates cryptography into the boot process and in other ways generally understands that the operating system's purpose is to act as the trusted computing base.
linux acts like the os is supposed to have a constant stream of sick new features for really specific situations (splice and io_uring in particular), which constantly exposes it to vulns. if you need to run in kernel space to access hardware, you're going to have a conflict between feature support and security eventually. so i think interfaces which enable deep integration with hardware without needing to pull in code to the kernel source tree that runs in ring 0 is a research goal for the future
@hipsterelectron well for the machines that have to interact in manners other than MMIO, the first idea that comes to mind is a register capability that allows one to set a boxed value that can then be forwarded to a register or something.
this also has a lot of practical utility because it sucks that you essentially can't ship a kernel module to users like other types of code. FUSE is an example of how to expose safe interfaces (although FUSE also necessarily doesn't interface with hardware. need to learn.....so much more about the motherboard to figure this out)
this was inspired by https://www.askbaize.com/blog/linux-compromises-broken-embargoes-and-the-shrinking-patch-window, which i didn't know at all would explicitly mention splice and io_uring (both places intentionally created to share memory between the kernel and userspace). this is the kind of thing imo that really can't be done as a one-off and indicates that the kernel really shouldn't be managing i/o in the first place (gasp!)
the reason the kernel has to manage i/o is because the kernel is the only thing that can connect to the hardware, which is where i/o happens. the reason this pisses me off is because it also imposes the kernel's idea of how i/o scheduling should work, which application programmers (build tools, package managers) constantly have to work against
it's a well-known meme that databases have to tell the kernel to stop fucking with the pages they allocate and let the db software manage things directly, because the db works differently than the kind of software that the page cache was made for. this isn't understood as the serious indictment of many os-based abstractions which to my understanding date back before unix
i have wanted less monolithic i/o APIs for years because they kept getting in the way of perf. in particular, most programming language async APIs are extremely detrimental to local i/o and are optimized for network traffic in a variety of ways. the kernel imposing very limited mechanisms for i/o scheduling makes it quite difficult to get "close to the metal", or on the other hand, to get "close to the user" by representing a precise set of application-level constraints
it's obviously a huge task to produce an interface that enables high-performance i/o without sharing the same memory space or even the same build system. it can be thought of as the difference between an existential vs universal quantifier. but the result will mean highly specialized application domains don't have to work around assumptions made for other use cases. and highly-secure systems won't incur tradeoffs made for other systems with less stringent requirements
with the establishment of standards (sel4 being the closest to what i want to see), this shouldn't then mean code becomes less portable. in fact, i think it should increase portability as OS interfaces become less of a product of the monolithic OS version, but rather a construction of userspace code that can be constructed independently of the kernel
the violence of a monolithic kernel is in needing to support every use case. with linux, this has led to compounding degrees of driver support, because the only way to support your hardware at all is to get it into the kernel. like the GPL(v2), it was an effective approach to coerce corporations to contribute code. but it achieved this coercion via monopoly. and eventually, corporate use cases became preferenced. this is not a problem for linus, who gets free labor and state-of-the-art research into his little project. but it becomes problematic for those who are at odds with corporate or government initiatives
@hipsterelectron meanwhile it's funny to see the extremely corporate Apple kernel go full circle, from NeXT slopping 4.2BSD on top of the Mach microkernel as a rush-to-market MVP 40 years ago to contemporary macOS/iOS carving out chunks to put them back into userspace daemons today
@joe @hipsterelectron I think there’d be an interesting essay in how much of the arc was driven by context switching overhead and other lower-level factors. Windows did a similar arc with NT 4 moving a ton of stuff into the kernel for performance and memory overhead reasons and then spending a decade moving stuff like printer drivers back out because they were a security disaster and processors were optimized a lot after the 486.
@joe to me it seems like the right architecture but it requires immense patience hence why it doesn't surprise me that linux used it and began to amass more functionality than e.g. hurd or others. my understanding too is that windows is a "microkernel" in some specific technical respects, but fails to use that to achieve isolation (because it reflects the segregated corporate organization of microsoft, and also the move fast break things corporate incentive model of microsoft)
@joe i will also note that OSXFUSE requiring an apple signature was used to extort twitter inc when we needed to make some changes necessary for our git virtual file system. so i will say that DRM is an alluring solution to this but without oversight can be used to harm security (which google and microsoft are imho much more guilty of). reducing the need for elevated permissions seems to be the path of righteousness. the problem then becomes a matter of API design
@hipsterelectron FSKit is first-party public API these days, and i believe MacFUSE has ported itself to run on top of it, so thankfully you don't need to pay any special tax to use it anymore (well, aside from the usual notarization/signing key fee if you want to get past the "unsigned binary" scare alerts)
@joe oh yes i was using it to delve into DRM as a construct more generally. i will be citing LZFSE in my compression work because it's remarkably well-written code. anything but ext4 on linux keeps suffering from weird errors and apple's investment in filesystems for users and not data centers is one of the examples of corporate investment in research that stays my hand from dismissing the whole thing outright
@joe will check out fskit that's very neat. i raised the lack of posix_getdents() (very recently added in 2024) when interviewing for the swift compiler team but really the posix VFS API is sorely lacking
@joe if apple ever makes a filesystem with guarantees on inode recycling i will gladly write non-portable code for that. posix will do that for pid recycling but not for the mutable database with zero atomicity guarantees we informally refer to as the filesystem
@joe oh my goodness sir i will almost definitely be stealing ideas from fskit this is remarkable
@hipsterelectron you might also be interested in NSFileProviderReplicatedExtension, which is the mechanism used for proxy files that get async fetched/generated, ostensibly by a cloud storage service but can also be used for other fun things https://developer.apple.com/documentation/fileprovider/nsfileproviderreplicatedextension
@joe there may be some potential similarities between cryptosystems and microkernel isolation. i think ARM MTE in grapheneos (and ios which failed to credit them) is an example of this. wish i could speak less vaguely on this, hardware is just so complex aaaaaa
@joe i think sel4 is probably something i wanna study closely (it's why i mention the military applications part above) bc i'm under the impression that's really where a lot of research into this has gone
@hipsterelectron *hurd noises intensify*
militaries globally contribute to much legitimate work in cybersecurity. even the legitimate work still enables the militaries of the world to dictate which architectures of security survive and remain extant. this is in fact the story of the so-called internet, which at every point makes technical decisions that enable surveillance and censorship, without accountability.
the internet is in fact not a reliable network—have you noticed this? a reliable network is not one which makes routing invisible, and demands you resend data it dropped. a reliable network makes routing transparent and user-defined (i am indeed describing source routing here), so that failures can be immediately triangulated, and the user can route around censorship imposed by the network
this is the kind of application-driven logic i seek from a microkernel more generally, and the kind of accountability structure i seek from a project when a vulnerability is discovered. this is the kind of "swappability" i work to achieve for stacks of software dependencies, and this is how ahierarchical composition can subvert and undermine monopolies
@hipsterelectron return to pickos. return to the database appliance OS