Post · bonfire.cafe

@hipsterelectron@circumstances.run · 12 hours ago

The UNIX system has been in wide use for over 20 years, and has helped to define many areas of computing.

@hipsterelectron@circumstances.run · 9 hours ago

Their semantics in LOCUS are identical to those seen on a single machine Unix system, even when processes are resident on different machines in LOCUS.

every NFS mount ever spontaneously returns SIGBUS

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 9 hours ago

ok i absolutely need to find this dissertation they just cited

Walker, B.J., Issues of Network Transparency and File Replication in Distributed Systems: LOCUS, Ph.D. Dissertation, Computer Science Department, University of California, Los Angeles, 1983

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 9 hours ago

acm is sending me to a broken proquest link but i now phd dissertations are saved somewhere

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 9 hours ago

i know someone at georgia tech i could ask them. or i could try the library of congress LMAO

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 9 hours ago

literally this is the cite

Just providing these seemingly simple ipc facilities was non-trivial, however. Details of the implementation are given in [WALK83].

they're taunting me. i must find it

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 9 hours ago

LOCUS permits one to execute programs at any site in the network, subject to permission control, in a manner just as easy as executing the program locally.

see this is why DARPA wouldn't go for it. ARPAnet hates permissioned systems. they all hate that shit. vint cerf is a google vp and this is exactly how LLM scrapers act when you permission them out of your face

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 9 hours ago

The mechanism is entirely transparent, so that existing software can be executed either locally or remotely, with no change to that software.

P A N T S
A A
N    N
T        T
S            S

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 9 hours ago

The decision about where the new process is to execute is specified by information associated with the calling process.

trying to reach through the screen again because i finally found someone who cares

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 9 hours ago

"structured advice list" all they need to say is dependency graph now

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

see here's the thing they knew they were building a goddamn operating system and that's one thing i didn't realize at the time

but their unix is obviously just my bazel

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

yeah so they did absolutely do better than me 40 years earlier and that's on me to catch up

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

The semantics of the available functions by which processes interact determines, to a large extent, the difficulty involved in supporting a transparent process facility.

posix reading my email screaming crying throwing up

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

The most difficult part of these functions' semantics is their expectation of shared memory.

TRUTHNUKE!!!!!!!

prom™️

@promovicz@chaos.social · 8 hours ago

@hipsterelectron yup yup. immutability and object scoping, i propose. compilers that understand memory behavior. high-level dope!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

Virtually all processes read and write substantial amounts of data per system call.

prom™️

@promovicz@chaos.social · 8 hours ago

@hipsterelectron should we analyze/describe task resource behavior like on a rusty old mainframe? yes!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

so funny story

As a result, most collections of Unix processes designed to execute on a single machine run very well when distributed on LOCUS.

this with pants and bazel except bazel literally only works in the "cloud". and the pants capacity for recursive/monadic tasks (dynamic dependency generation e.g. resolving a lockfile) is specifically how we could do outline compiles with twitter rsc locally and then farm scalac out to finish the job. so you control it locally and get results locally, and the cloud is used as extra capacity instead of controlling your organizational capacity

so google really didn't like that

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

literally these corps are nothing when we dream together. that was 2019 when google killed rust and pants because we showed them up and demonstrated their attempts to RCE every internet user with chrome and every software user with bazel were perhaps not such noble schemes to protect users against evil!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

perhaps it's strange that the surveillance company who graciously negotiates TLS certificates for you keeps describing network code as tricky and unsafe! perhaps when they graciously offer a DRM ✅ that requires you to leave your python packages alone in a room with their server before publishing to pypi that........you are not keeping your users safe with the ✅!

(the astral engineer who slanders zip files wrote the microsoft DRM and removed pgp keys from pypi.)

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

astral got funding from google and chainguard and themselves contributed funds to one of the literally 3 separate rust2c "forks" of zstd. chainguard is the startup who developed the DRM ✅ technology

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

unfortunately, one of chainguard's cryptographers (santiago) is pretty cool. i should reach out to him because his work on attestation graphs is also how you'd do signed builds if you wanted to use your own hardware

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

which of course means process isolation guarantees..........and i happen to know an agender who's into that freaky shit

prom™️

@promovicz@chaos.social · 8 hours ago

@hipsterelectron i play with VM APIs, but “gender”.

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

The new error types primarily concern cases where either the calling or called machine fails while the parent and child are still alive.

omg like gundam

When the child's machine fails, the parent receives an error signal. Additional information about the nature of the error is deposited in the parent's process structure, which can be interrogated via a new system call.

they did NOT defeat me here! i have saved lives with my signal handling tiered logging before

but they did put it in the OS, so i still gotta catch up

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

For those data types which the system understands, automatic reconciliation is done.

easy mode. what's next

Otherwise, the problem is reported to a
higher level; a database manager for example, who
may itself be able to reconcile the inconsistencies.

can the database be my manager

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

Eventually, if necessary, the user is notified and tools
are provided by which he can interactively merge the copies.

do scientists say this? i know i say this and i know pouzin says this. do people say this? dr. jennings says this. do scientists still say this? i bet they do. i need to find them. i want to build the future with them

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

4.1 Partitions
Partitions clearly are the primary source of
difficulty in a replicated environment. Some authors
have proposed that the problem can be avoided by
having high enough connectivity that failures will
not result in partitions. In practice, however, there
are numerous ways that effective partitioning occurs.

oh MAN that's the coldest "in practice" i've ever seen

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

Even when the hardware level is functioning, there are miriad ways that software levels cause messages not to be communicated; buffer lockups, synchronization errors, etc.

that's a really cute spelling of myriad. i like the y but somehow miriad feels less showy/extravagant and more focused/serious

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

see this is why you can't ever pretend to separate engineers from "SRE" work:

In addition, there are maintenance and hardware failure scenarios that can result in file modification conflict even when two sites have never executed independently at the same time.

the engineer can theorize

For example, while site B is down, work is done on site A. Site A goes down before B comes up. When site A comes back up, an effective partition merge must be done.

the practitioner can actualize. junyer the RE2 maintainer and a great teacher of mine was an SRE before he ended up maintaining the code that powers google search

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

this is exactly the kind of shit i would write

an immediate question is whether a data object, appearing in more than one partition, can be updated during partition.

i.e. "if google cloud goes down how fucked are you"

In our judgment, the answer must be yes.

JUDGEMENT! MUST BE! YES!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

this was also why i realized atomic memory models were solving a different problem space!

in many environments, the probability of conflicting updates is low. Actual intimate sharing is often not the rule.

and if it is......localize those intimate motherfuckers!!! give them room to explore each others' address spaces!!!!!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

i am quite surprised why they remain stuck in the mode of trying to adjudicate a best-effort answer to mutually conflicting modifications. like to me "conflicts" e.g. writing to the same location indicate that there is an unresolved user-level "conflict" between their scheduled tasks!

and this is where aaron turon's formal verification framework would say: the behavior is incorrect! it produces a data race!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 8 hours ago

that's why i proposed the named hierarchy of sync contexts (up to global). it takes our environment isolation mechanism and gives the user a structured resource (with a lifetime) they can use to express a sequence of modifications that can be unambiguously verified for correctness.

what is correctness?

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

in each sync context, modifications are built up as an ordered sequence of i/o operations, then explicitly committed. as these operations are evaluated during the blocking commit call, the only correctness requirement is that a named resource (file path) previously committed to the same context cannot match the name of a new resource.

note that this arises not as we engage in attempting to write a whole bunch of data, but when we allocate (e.g. open()) a resource handle. this should fail eagerly, and quickly, and unambiguously indicate a problem with user input, or with the input environment!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

OMG NOOOOOOO I JUST READ THE NEXT PAGE THIS IS A MIND MELD

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

The second case is the one that gives rise to
transactions. Here it is recognized that changes to
sets of objects are related.

LOCALITY GANG!!!!!

Reconciliation of differing versions of an object must be coordinated with other objects and the operations on those objects which occurred during partition.

name collision indicates a model failure—it can't be expected to be resolved by just choosing one version!

In addition, LOCUS provides a full nested transaction facility for those cases where the user wishes to bind a set of
events together.

"full nested transaction facility" is exactly what people writing build processes have needed for DECADES

Case specific merge strategies have been developed.

🤩🤩🤩🤩 that's me rn

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

emailing posix subject "full nested transaction facility" link to this pdf send

(not really; i aborted the transaction. see how broadly this can be applied?)

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

ok so i was vaguely thinking earlier that like "this network partitioning stuff is more intense than my purely-local use case" but another cited paper on mutual inconsistency detection had this really clever line: https://www.cs.purdue.edu/homes/bb/cs542-11Spr/Parker_TSE83.pdf

View (PDF)

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

The most typical response is to enforce consistency by permitting files to be accessed only in one partition.

this is precisely what i was imposing when describing name conflicts as a model error!

Unfortunately, effective implementation of this policy can often result in the files being accessible in zero partitions!

in our case, this could mean...not persisting to disk before a crash! in the posix email, i described a hierarchy, where syncing to a named context must occur before persisting. that was wrong!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

oh no.......oh dear........what if........what if this means i do need to persist to disk like the other filesystems i derided so eagerly

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

i think if i end up with a really thoughtful set of heterogenous interacting processes managing custom-built data structures both in-memory and on-disk like zfs does..........maybe i can accept that. jvm bytecode is easily the best fucking IR humanity has ever achieved

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

jar files are zip files because sun microsystems understands that the most powerful journaled filesystem......is the one you carry with you in your heart every day

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

i hope google succeeds in getting python to switch off zips to .tar.zsts so i can roll out my Zip File From the Future with tree hashing for fast splitting and merging along with the merkel-damgård length extension proof of concept

but i think they won't, because the zip index is too useful. zip files are literally just tarballs with an index. undefeatable

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 7 hours ago

the future part is that my zip can be made to allow either leading or trailing bytes, so it can avoid clobbering its own index when doing appends, but still supports self extracting executables and still foils length extension attacks

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

it's really simple actually it's not really like an achievement or something but it's really really sick to have a data layout that's immediately readable, immediately appendable without blocking, and has unambiguous boundaries so splitting into subsets, or merging entries from multiple constituents, are all things you can do fearlessly and frequently

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

a lot of the classic kernel data structures are specifically intended to do two things i really don't care about:
(1) maintain atomic read/write coherency from separate processes over a contiguous region of shared memory
(2) infer likely future i/o access patterns

(1) i simply refuse to accept as a valid behavior (referring to write()/read() atomicity and serialization for overlapping regions of the same physical page). i think that's literally just obvious UB if if was in the same address space?

it's a textbook math problem where the right answer is "not enough information"

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

there are some very neat data structures for mapping intervals of a contiguous region (the prof who worked on java at sun jerry roth loved red/black trees!) which will definitely be useful for virtual mappings. but in general i'm pretty confident that cross-process (in-kernel) control flow can and should be unidirectional message passing.

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

i read one really silly paper from some real hpc scientists who took snapshots of the unified (user+kernel) call stack over the course of process execution in an attempt to infer where to prefetch and where to drop from cache. they had so many numbers!!! their experimentation technique was honestly pretty cool!!!!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

but i/o dependencies are usually extremely predictable!!! and much of the blocking i/o performed by compilers/modules/interpreters is because EVERYONE still performs some form of linear path traversal (compilers with includes, linkers with libraries, interpreters with modules) before actually processing all that input!

not only is this just a classic source of non-reproducibility, it places vfs traversal and demand paging on the critical path of your executable's actual purpose!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

scalac had by far the worst example of this i've ever seen. an odersky special. he hand-parsed classfiles (jvm bytecode, often with scala-specific data), and made use of some async/coroutine mechanism, so instead of any pipelining at all there's just this horrifying call stack jumping between the type checker and then back to find another class. i'm pretty sure he iterated linearly too instead of matching entries by name

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

i had to completely rip it out and rewrite it and that was one point where i actively felt scared and in over my head. but the concept of pipelining really stuck with me. in this case i'm referring not to multithreaded i/o loops, but identifying a data structure that can be efficiently queried, which you generate from a preprocessing step

luckily, sun microsystems had largely done the job there already

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

the linear path traversal in the critical path of your build process has two, maybe three steps:

performing the highly domain-specific process to identify files (e.g. headers for -I) from what are essentially VFS query expressions
paging in those inputs from file
parsing/loading/interpreting/evaluating the input data

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

note that the C preprocessor foils our attempt to neatly split these roles up, adding more i/o dependencies that must be interpreted in the context of a -I arg. was robert pike right???

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

of course not! this is the pre processor!!! and while we can schedule the preprocessor execution entirely in advance of the compiler, we also want to extract the precise input paths it read from, so we can calculate whether to invalidate build output if any of those inferred dependencies are modified!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

this is going to be such a massive task lmao. i don't think a build tool should be constructing dependency graphs between reads and writes without actually being the OS.

i remember a microsoft eng spoke about an internal build system named "domino" once at a conference, which attempted to do exactly this (track task deps from read calls issued by a compiler). i remember thinking it seemed ridiculous at the time because of course you'd know that up front—but for the C preprocessor, you certainly can't

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

it seems like kind of a ridiculous thing to think about an os just for a build system,

but let's not get ahead of ourselves—it's also out of spite

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

i'm sure there are other applications too. like i'm def curious about cryptographic operations (especially establishing an isolation boundary around any key usage)

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

it's kind of exciting to challenge literally every computer interaction this way. like dns resolution: can we limit side channels that leak which sites we query?