Post · bonfire.cafe

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 10 hours ago

The UNIX system has been in wide use for over 20 years, and has helped to define many areas of computing.

0

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

i think if i end up with a really thoughtful set of heterogenous interacting processes managing custom-built data structures both in-memory and on-disk like zfs does..........maybe i can accept that. jvm bytecode is easily the best fucking IR humanity has ever achieved

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

jar files are zip files because sun microsystems understands that the most powerful journaled filesystem......is the one you carry with you in your heart every day

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

i hope google succeeds in getting python to switch off zips to .tar.zsts so i can roll out my Zip File From the Future with tree hashing for fast splitting and merging along with the merkel-damgård length extension proof of concept

but i think they won't, because the zip index is too useful. zip files are literally just tarballs with an index. undefeatable

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 6 hours ago

the future part is that my zip can be made to allow either leading or trailing bytes, so it can avoid clobbering its own index when doing appends, but still supports self extracting executables and still foils length extension attacks

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

it's really simple actually it's not really like an achievement or something but it's really really sick to have a data layout that's immediately readable, immediately appendable without blocking, and has unambiguous boundaries so splitting into subsets, or merging entries from multiple constituents, are all things you can do fearlessly and frequently

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

a lot of the classic kernel data structures are specifically intended to do two things i really don't care about:
(1) maintain atomic read/write coherency from separate processes over a contiguous region of shared memory
(2) infer likely future i/o access patterns

(1) i simply refuse to accept as a valid behavior (referring to write()/read() atomicity and serialization for overlapping regions of the same physical page). i think that's literally just obvious UB if if was in the same address space?

it's a textbook math problem where the right answer is "not enough information"

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

there are some very neat data structures for mapping intervals of a contiguous region (the prof who worked on java at sun jerry roth loved red/black trees!) which will definitely be useful for virtual mappings. but in general i'm pretty confident that cross-process (in-kernel) control flow can and should be unidirectional message passing.

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

i read one really silly paper from some real hpc scientists who took snapshots of the unified (user+kernel) call stack over the course of process execution in an attempt to infer where to prefetch and where to drop from cache. they had so many numbers!!! their experimentation technique was honestly pretty cool!!!!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

but i/o dependencies are usually extremely predictable!!! and much of the blocking i/o performed by compilers/modules/interpreters is because EVERYONE still performs some form of linear path traversal (compilers with includes, linkers with libraries, interpreters with modules) before actually processing all that input!

not only is this just a classic source of non-reproducibility, it places vfs traversal and demand paging on the critical path of your executable's actual purpose!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 5 hours ago

scalac had by far the worst example of this i've ever seen. an odersky special. he hand-parsed classfiles (jvm bytecode, often with scala-specific data), and made use of some async/coroutine mechanism, so instead of any pipelining at all there's just this horrifying call stack jumping between the type checker and then back to find another class. i'm pretty sure he iterated linearly too instead of matching entries by name

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

i had to completely rip it out and rewrite it and that was one point where i actively felt scared and in over my head. but the concept of pipelining really stuck with me. in this case i'm referring not to multithreaded i/o loops, but identifying a data structure that can be efficiently queried, which you generate from a preprocessing step

luckily, sun microsystems had largely done the job there already

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

the linear path traversal in the critical path of your build process has two, maybe three steps:

performing the highly domain-specific process to identify files (e.g. headers for -I) from what are essentially VFS query expressions
paging in those inputs from file
parsing/loading/interpreting/evaluating the input data

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

note that the C preprocessor foils our attempt to neatly split these roles up, adding more i/o dependencies that must be interpreted in the context of a -I arg. was robert pike right???

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

of course not! this is the pre processor!!! and while we can schedule the preprocessor execution entirely in advance of the compiler, we also want to extract the precise input paths it read from, so we can calculate whether to invalidate build output if any of those inferred dependencies are modified!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

this is going to be such a massive task lmao. i don't think a build tool should be constructing dependency graphs between reads and writes without actually being the OS.

i remember a microsoft eng spoke about an internal build system named "domino" once at a conference, which attempted to do exactly this (track task deps from read calls issued by a compiler). i remember thinking it seemed ridiculous at the time because of course you'd know that up front—but for the C preprocessor, you certainly can't

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

it seems like kind of a ridiculous thing to think about an os just for a build system,

but let's not get ahead of ourselves—it's also out of spite

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

i'm sure there are other applications too. like i'm def curious about cryptographic operations (especially establishing an isolation boundary around any key usage)

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

it's kind of exciting to challenge literally every computer interaction this way. like dns resolution: can we limit side channels that leak which sites we query?

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

god fuck i refuse to implement TLS. ok i'm gonna keep reading about data

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

"wow, the internet is amazing! i learn so many things!"

reality: in order to see your friends, you must accept the existence of TLS 1.3

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

thinking about the IETF reminds me that this insane vaporware idea to build a safe kernel is not terrible

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

the partitioning paper throwing the deepest shade on xerox parc lmao:

Disk Toting: In this approach, employed at Xerox Parc and other installations where very intelligent terminals are linked via a network

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

oh wow i was about to criticize the idea of "version history" of a shared resource but this is very obviously a fascinating case study for a version control system. they're calling it a "version conflict" for mutually inconsistent changes. @SRAZKVT https://www.cs.purdue.edu/homes/bb/cs542-11Spr/Parker_TSE83.pdf

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

example given of a bank account balance being simultaneously withdrawn, and if the result runs below 0 we have a conflict. i have no clue how to characterize correctness for a file conflict outside of heuristics (and i'm curious if we can say something meaningful without them)

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 4 hours ago

YESSSSSS WE GOT A GRAPH THEORIST OVER HERE!!!!!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

easily the least understandable definition of a graph i've ever read. deciding to move on

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

i do not believe anyone would decide not to label the edges with the specific change each edge induced and instead stuff it into "node labels" but whatever i'm over it

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

oh and there's just more info not written in the graph and then the author says the graph doesn't actually represent conflicts correctly. ok stop wasting my time? anyway now we get to version vectors

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

wtf first off why does he keep mentioning timestamps second off he raised a strawman about update logs for some reason. he dies if this vector thing isn't good

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

oh that's it ok waste of time

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

the LOCUS paper likes it but i don't rly care about version history in this case. i'm glad i remembered version control systems are cool though because that's another way i can defeat linus

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

OOOH WOW OMG

The LOCUS recovery and merge philosophy is
hierarchically organized. The basic system is responsible for detecting all conflicts.

this is kind of why i like the idea of making data scoped by default, because i don't think most files need to know about each other or consider a generic conflict system. all of these seem to be solutions to a problem of "how can we ensure global coherence" when a computer generally isn't about global coherence? not the way i use it

For those data types that it manages, including internal system data as well as file system directories, automatic merge is done by the system.

i do absolutely fuck with having semantic understanding of the structures the filesystem/vcs manages

If the system is not responsible for a given file type, it reflects the problem up to a higher level; to a recovery/merge manager if one exists for the given file type.

a recovery manager, maybe? but knowledge of merging seems like an ahierarchical property. i still like the idea of "merge manager"

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

To develop a merge procedure for any data type,
including directories, it is necessary to evaluate the
operations which can be applied to that data type.

yes! this is why i'm infatuated with my fractal zip!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

they mention doing directory merge in the background and i'm like yes!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

Automatic reconciliation of user mailboxes is important in the LOCUS replication system, since notification of name conflicts in files is done by sending the user electronic mail.

this seems like a mistake. if name conflicts are associated with the user who owns them, imo they absolutely should have a separate structure to record name conflicts under review. email is super complex

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

love love this method of resolving conflicts by giving users tools to override or work around. sign of Giving A Shit About Users:

In any case, files with unresolved conflicts are marked so normal attempts to access them fail, although that control may be overridden.

could be annoying but seems like an effective compromise

A trivial tool is provided by which the user may
rename each version of the conflicted file and make
each one a normal file again.

LOVE keeping absolutely every version like this. it codifies the situation where "i can't fuck with this rn but i can't risk losing anything" which i SO relate to

Then the standard set of application programs can be used to compare and merge the files.

ediff bae !!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

ooooh so i totally think they made the wrong choice here

The system strives to insulate the users from reconfigurations, providing continuing operation with only negligible delay.

yeah ok but that's just an SLA. it's not "insulating users" that's just doing your job

Requiring user programs to deal with reconfiguration would shift the network costs from the operating system to the applications programs.

(1) HUGE leap from "users" to "user programs" lmao
(2) is it "user" or "application" programs? if the users own files they might care whether they're being rehomed lol

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

wish they explained the significance of a virtual circuit!!! that's another pouzin neologism which was immediately twisted out of recognition

Network information is kept internally in both a high-level status table and a collection of virtual circuits,

wish this was explained

The virtual circuits deliver messages from site A to site
B (the virtual circuits connect sites, not processes} in the
order they are sent. If a message is lost, the circuit is
closed. The mechanism defends the local site from the
slow operation of a foreign site.

this is so curious for so many reasons! maintaining message ordering is interesting too but i wonder if it could be muxed?

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

i'm rly curious about this site-to-site communication that wraps individual messages from users bc that is how a ddos-resistant anonymity catenet that hides traffic might work

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

oh i'm so sad they just drop handles to remote resources when the link breaks! this is where a dependency graph could make sense since progress through it can be halted and resumed!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 3 hours ago

i do like the idea of recovery manager though. one thing we didn't attempt to do in the pants task graph was any form of retry, or any codification of special conditions around a subgraph. why? because someone else didn't like it. sigh!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

i do kind of like the idea of a named sync context being the locus of a recovery condition, so that if any tasks syncing to the context fail, the context can free its pages. this would absolutely propagate sideways (tasks declaring two contexts would blow up if one does) and upwards (if the sync context can't recover, goes up to its parent context)

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

hmmmm should a sync context also then be the locus of declaring tasks and their dependencies? that answers another question we never addressed in pants i.e. making the loading of rules into one of the operations of the system as opposed to loading everything in advance

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

i'm so upset how this wonderful paper is concluding ugh!!! they keep saying "transparent" to mean "opaque" and then they say their biggest problem was not being able to pretend a networked file was local hard enough. i don't believe that for a second. it's not "a failure of transparency" if viewing a remote file is spotty, you're not allowing the user to explicitly relocate that file!!!

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

We found that the primary motivation for remote execution was load balancing.

completely uninterested in your framing of your users' "motivation" (why are users not even named???) and this is also a ridiculous thing to state without whatsoever describing their experience of local load

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

anyway that shit was wild

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

seeing "remote execution" in a 1981 paper and knowing UCLA did it better than us in most ways was both exciting and scary

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

the google remote execution API is such fucking shit lmao

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

back to the BSD hagiography

Because 4.2BSD included many new facilities, it suffered a loss of performance compared to 4.1BSD, partly because of the introduction of symbolic links.

blaming two separate things for performance degradation lmao

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

Some pernicious bugs had been introduced, particularly in the TCP protocol implementation.

TCP is a fucking joke. the pernicious bug is TCP

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

Others, such as TCP/IP subnet and routing support, had not been specified soon enough by outside parties for them to be incorporated in the 4.2BSD release.

"outside parties" ??? this is still darpaslop and cerf does not let others specify his protocols for him

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

Commercial systems usually maintain backward compatibility for many releases, so as not to make existing applications obsolete.

complete falsehood. that's what we did at pants and what we do for spack because we are government workers and we achieve guarantees for our users

Maintaining compatibility is increasingly difficult, however,

skill issue, should have used spack

so most research systems maintain little or no backward compatibility.

debatable. very funny how rapidly this guy jumps between commerce and research lol

d@nny disc@ mc²

@hipsterelectron@circumstances.run · 2 hours ago

The implementation changes between 4.2BSD and 4.3BSD generally were not visible to users, but they were numerous.

i would think performance and bug fixes are user-visible, but whatever

For example, the developers made changes to improve support for multiple network-protocol families, such as XEROX NS, in addition to TCP/IP.

if i ever find myself wanting to feel bad i absolutely want to read cerf's code because i wanna know if he gives off vibes that would ring a bell