The UNIX system has been in wide use for over 20 years, and has helped to define many areas of computing.
Post
The new error types primarily concern cases where either the calling or called machine fails while the parent and child are still alive.
omg like gundam
When the child's machine fails, the parent receives an error signal. Additional information about the nature of the error is deposited in the parent's process structure, which can be interrogated via a new system call.
they did NOT defeat me here! i have saved lives with my signal handling tiered logging before
but they did put it in the OS, so i still gotta catch up
For those data types which the system understands, automatic reconciliation is done.
easy mode. what's next
Otherwise, the problem is reported to a
higher level; a database manager for example, who
may itself be able to reconcile the inconsistencies.
can the database be my manager
Eventually, if necessary, the user is notified and tools
are provided by which he can interactively merge the copies.
do scientists say this? i know i say this and i know pouzin says this. do people say this? dr. jennings says this. do scientists still say this? i bet they do. i need to find them. i want to build the future with them
4.1 Partitions
Partitions clearly are the primary source of
difficulty in a replicated environment. Some authors
have proposed that the problem can be avoided by
having high enough connectivity that failures will
not result in partitions. In practice, however, there
are numerous ways that effective partitioning occurs.
oh MAN that's the coldest "in practice" i've ever seen
Even when the hardware level is functioning, there are miriad ways that software levels cause messages not to be communicated; buffer lockups, synchronization errors, etc.
that's a really cute spelling of myriad. i like the y but somehow miriad feels less showy/extravagant and more focused/serious
see this is why you can't ever pretend to separate engineers from "SRE" work:
In addition, there are maintenance and hardware failure scenarios that can result in file modification conflict even when two sites have never executed independently at the same time.
the engineer can theorize
For example, while site B is down, work is done on site A. Site A goes down before B comes up. When site A comes back up, an effective partition merge must be done.
the practitioner can actualize. junyer the RE2 maintainer and a great teacher of mine was an SRE before he ended up maintaining the code that powers google search
this is exactly the kind of shit i would write
an immediate question is whether a data object, appearing in more than one partition, can be updated during partition.
i.e. "if google cloud goes down how fucked are you"
In our judgment, the answer must be yes.
JUDGEMENT! MUST BE! YES!
this was also why i realized atomic memory models were solving a different problem space!
in many environments, the probability of conflicting updates is low. Actual intimate sharing is often not the rule.
and if it is......localize those intimate motherfuckers!!! give them room to explore each others' address spaces!!!!!
i am quite surprised why they remain stuck in the mode of trying to adjudicate a best-effort answer to mutually conflicting modifications. like to me "conflicts" e.g. writing to the same location indicate that there is an unresolved user-level "conflict" between their scheduled tasks!
and this is where aaron turon's formal verification framework would say: the behavior is incorrect! it produces a data race!
that's why i proposed the named hierarchy of sync contexts (up to global). it takes our environment isolation mechanism and gives the user a structured resource (with a lifetime) they can use to express a sequence of modifications that can be unambiguously verified for correctness.
what is correctness?
in each sync context, modifications are built up as an ordered sequence of i/o operations, then explicitly committed. as these operations are evaluated during the blocking commit call, the only correctness requirement is that a named resource (file path) previously committed to the same context cannot match the name of a new resource.
note that this arises not as we engage in attempting to write a whole bunch of data, but when we allocate (e.g. open()) a resource handle. this should fail eagerly, and quickly, and unambiguously indicate a problem with user input, or with the input environment!
OMG NOOOOOOO I JUST READ THE NEXT PAGE THIS IS A MIND MELD
The second case is the one that gives rise to
transactions. Here it is recognized that changes to
sets of objects are related.
LOCALITY GANG!!!!!
Reconciliation of differing versions of an object must be coordinated with other objects and the operations on those objects which occurred during partition.
name collision indicates a model failure—it can't be expected to be resolved by just choosing one version!
In addition, LOCUS provides a full nested transaction facility for those cases where the user wishes to bind a set of
events together.
"full nested transaction facility" is exactly what people writing build processes have needed for DECADES
Case specific merge strategies have been developed.
🤩🤩🤩🤩 that's me rn
emailing posix subject "full nested transaction facility" link to this pdf send
(not really; i aborted the transaction. see how broadly this can be applied?)
ok so i was vaguely thinking earlier that like "this network partitioning stuff is more intense than my purely-local use case" but another cited paper on mutual inconsistency detection had this really clever line: https://www.cs.purdue.edu/homes/bb/cs542-11Spr/Parker_TSE83.pdf