The UNIX system has been in wide use for over 20 years, and has helped to define many areas of computing.
However, due to the potential for replicated storage of a new file, the create call needs two additional pieces of information - how many copies to store and where to store them.
you're never gonna believe what i emailed posix about earlier !!
except in my model replication is inferred by the system from the highly structured i/o agenda, which is provided at "compile time" before executing the task graph. so the extra context for me is instead a specific named "synchronization context" which generally corresponds to the virtual address space and visible filesystem contents for a set of tasks
(the sync context is itself a graph structure.)
Adding such information to the create call would change the system interface so instead defaults and per process state information is used, with system calls to modify them.
i'm not afraid of EEE
you could argue that sync context could be inferred from the i/o agenda too but:
- this is actually the only way my system supports state sync i.e. shared memory
- anonymous sync contexts are used to establish "chroots" for atomic i/o sequences
- if the user has their own ideas about scheduling, listen to them!!
the day win wang and i changed the world together at twitter inc was when we realized the infrastructure we'd constructed couldn't do context-specific locality. win's rsc scala compiler was DAG-scheduled and needed to share memory in a persistent local jvm. once it generated an outline to compile against, we could farm out embarrassingly parallel AOT-compiled scalac jobs (using the scoot RPC system from drew gassaway).
pants couldn't do that—yet. but we did it together. immediate 2x improvement before any further optimization. that's why compiler and build tool devs need to work together!!!
good thing too, cause it was like 3 days before this talk https://youtube.com/watch?v=87K4_v2IvBg it's not me giving it but there's a point where he shows the zipkin traces we constructed and you can just see the parallelism explode like a dubstep drop
This algorithm is localized in the code and may change as experience with replicated files grows.
also of course this project treats inodes like a real thing that has meaning
The storage site allocates an inode number from a pool which is local to that physical container of the filegroup. That is, to facilitate inode allocation and allow operation when not all sites are accessible, the entire inode space of a filegroup is partitioned so that each physical container for the filegroup has a collection of inode numbers that it can
allocate.
ted ts'o screaming crying throwing up rn
When all the storage sites have seen the delete, the inode can be reallocated by the site which has control of that inode (i.e. the storage site of the original create).
so this is legitimately the reason you want inodes like internally right? to do localized resource indexing! and like ok, but:
- if that's the purpose of the inode, then don't fucking expose it to userspace? if i use an inode that way, it's not gonna be in my dirents!
- the user still deserves an external inode! generate it completely differently! recycle it in its own way!
Solutions to the number representation and byte ordering problems have not yet been implemented.
LMAO
holy shit they made a virtual chroot for subprocess execution. THESE MOTHERFUCKERS MADE DOCKER!
Basically the scheme consists of four parts:
I'M READY I'M FUCKING READY
so like not only does clearly hint at some of the work we would do like 40 years later with spack to codify shared library ABI, you could even argue for the direct thread from this to my proposal to systemd to force upfront declaration of their dlopen() deps!!
a special kind of directory (hereafter referred to as a hidden directory)
nobody talks about this
Keep a per-process inherited context for these hidden directories.
this is specifically for ABI translation. this is pretty obviously the future
Give users and programs an escape mechanism to make hidden directories visible so they can be examined and specific entries manipulated.
this is literally something pouzin has literally said