The UNIX system has been in wide use for over 20 years, and has helped to define many areas of computing.
Post
it's really simple actually it's not really like an achievement or something but it's really really sick to have a data layout that's immediately readable, immediately appendable without blocking, and has unambiguous boundaries so splitting into subsets, or merging entries from multiple constituents, are all things you can do fearlessly and frequently
a lot of the classic kernel data structures are specifically intended to do two things i really don't care about:
(1) maintain atomic read/write coherency from separate processes over a contiguous region of shared memory
(2) infer likely future i/o access patterns
(1) i simply refuse to accept as a valid behavior (referring to write()/read() atomicity and serialization for overlapping regions of the same physical page). i think that's literally just obvious UB if if was in the same address space?
it's a textbook math problem where the right answer is "not enough information"
there are some very neat data structures for mapping intervals of a contiguous region (the prof who worked on java at sun jerry roth loved red/black trees!) which will definitely be useful for virtual mappings. but in general i'm pretty confident that cross-process (in-kernel) control flow can and should be unidirectional message passing.
i read one really silly paper from some real hpc scientists who took snapshots of the unified (user+kernel) call stack over the course of process execution in an attempt to infer where to prefetch and where to drop from cache. they had so many numbers!!! their experimentation technique was honestly pretty cool!!!!
but i/o dependencies are usually extremely predictable!!! and much of the blocking i/o performed by compilers/modules/interpreters is because EVERYONE still performs some form of linear path traversal (compilers with includes, linkers with libraries, interpreters with modules) before actually processing all that input!
not only is this just a classic source of non-reproducibility, it places vfs traversal and demand paging on the critical path of your executable's actual purpose!
scalac had by far the worst example of this i've ever seen. an odersky special. he hand-parsed classfiles (jvm bytecode, often with scala-specific data), and made use of some async/coroutine mechanism, so instead of any pipelining at all there's just this horrifying call stack jumping between the type checker and then back to find another class. i'm pretty sure he iterated linearly too instead of matching entries by name
i had to completely rip it out and rewrite it and that was one point where i actively felt scared and in over my head. but the concept of pipelining really stuck with me. in this case i'm referring not to multithreaded i/o loops, but identifying a data structure that can be efficiently queried, which you generate from a preprocessing step
luckily, sun microsystems had largely done the job there already
the linear path traversal in the critical path of your build process has two, maybe three steps:
- performing the highly domain-specific process to identify files (e.g. headers for
-I) from what are essentially VFS query expressions - paging in those inputs from file
- parsing/loading/interpreting/evaluating the input data
note that the C preprocessor foils our attempt to neatly split these roles up, adding more i/o dependencies that must be interpreted in the context of a -I arg. was robert pike right???
of course not! this is the pre processor!!! and while we can schedule the preprocessor execution entirely in advance of the compiler, we also want to extract the precise input paths it read from, so we can calculate whether to invalidate build output if any of those inferred dependencies are modified!
this is going to be such a massive task lmao. i don't think a build tool should be constructing dependency graphs between reads and writes without actually being the OS.
i remember a microsoft eng spoke about an internal build system named "domino" once at a conference, which attempted to do exactly this (track task deps from read calls issued by a compiler). i remember thinking it seemed ridiculous at the time because of course you'd know that up front—but for the C preprocessor, you certainly can't
it seems like kind of a ridiculous thing to think about an os just for a build system,
but let's not get ahead of ourselves—it's also out of spite
i'm sure there are other applications too. like i'm def curious about cryptographic operations (especially establishing an isolation boundary around any key usage)
it's kind of exciting to challenge literally every computer interaction this way. like dns resolution: can we limit side channels that leak which sites we query?
god fuck i refuse to implement TLS. ok i'm gonna keep reading about data