Post by @hipsterelectron@circumstances.run

@hipsterelectron@circumstances.run 2 months ago

i feel that the grammar of a programming language is among the least appropriate of all possible facets of its behavior to start off with. why on earth would i care about your preferred tokens to represent concepts which have not yet been defined

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

omgggg NO THEY FUCKING DIDN'T MAKE THEIR RUNTIME BEHAVIOR A PRODUCT OF THEIR COMPILE-TIME SEMANTICS https://smlfamily.github.io/sml97-defn.pdf

Since signature expressions are mostly dealt with in the static semantics, the dynamic semantics need only take limited account of them.

this is so unserious. the static semantics don't even exist to me yet. i can't believe someone would write a document that claims to describe behavior without an explicit lowering process

View (PDF)

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

like this is what you would need if you want someone to interop their FFI with your language

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Hitherto, the semantic rules have not exposed the interactive nature of the language.

i feel this assumes a great deal. this seems more of the semantic rules of a particular CLI program provided with the default distribution?

During an ML session the user can type in a phrase, more precisely a phrase of the form topdec as defined in Figure 8, page 15.

"during an ML session" omg
"the user can type": that is not meaningful without describing a whole lot more interactions with the OS

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

In practice, ML implementations may provide a directive as a form of top-level declaration for including programs from files rather than directly from the terminal.

FILES??? TERMINAL????

~107 more replies

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

So far, for simplicity, we have used the same notation B to stand for both a static and a dynamic basis, and this has been possible because we have never needed to discuss static and dynamic semantics at the same time.

i understood compilation to be the translation from an IR (static semantics) into the ABI (dynamic semantics)

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

C Appendix: The Initial Static Basis
In this appendix (and the next) we define a minimal initial basis for execution. Richer bases may be provided by libraries.

for "execution" means something different to the authors than it does to me

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

omg

At the same time, imperative features were important for practical reasons; no-one had experience of large useful programs written
in a pure functional style. In particular, an exception-raising mechanism was highly desirable for the natural presentation of tactics.

these are still a matter of grammar to me. "imperative features" is an interface to the programmer imho

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

The full definition of this first version of ML was included in a book [19] which describes LCF, the proof system which ML was designed to support.

literally omfg why didn't you send me there FIRST?????

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Other early influences were the applicative languages already in use in Artificial Intelligence.

i am not the expected audience

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

i keep reading to find when i'm gonna find some discussion of semantics. hasn't happened yet

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

the reason i fell into this trap in the first place because i wanted to understand what "C formalized in HOL" was on about https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-453.pdf

C also combines a number of interesting features on the theoretical front, making it additionally interesting as a subject of study.

this is not something i wanna hear from someone claiming to have formalized it

For example, C’s expressions both are side-effecting and have very under-specified evaluation orders. If these semantic features were the main area of interest in studying a language, then it would clearly be easier to construct a simple calculus that included these features and little else.

this is ridiculous. obviously these semantic features are not ideal when attempting to write code that runs e.g. in ring 0. yet people do it (and this is meaningfully outside the C standard). the UB becomes defined thanks to our friends who write the compiler. is it worth attempting to standardize ring 0 properties?

However, we prefer to attack as much of C as possible all at once. As Milner and Tofte point out in the commentary on the definition of SML [MT90], this study of languages in their entirety has its own grounds for interest, and we further feel that our study of C gives us a possible
application in the area of software verification.

they didn't even mention a single concrete implementation until the appendices

View (PDF)

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

This style of definition was used in the definition of Standard ML by Milner, Tofte and Harper [MTH90]. This example, one of the most famous formal language definitions, is a clear demonstration that a large language can be formalised in this manner.

i'm getting the impression that the seL4 HOL C semantics may not be as useful as it's being let on lmao

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Precisely because the standard’s definition of C is not formal, we can never hope to prove our formal definition consistent with it.

actively violent and evil thing to say

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

At best we can hope that our definition comes to be seen as correct by the community of people concerned with C’s definition and standardisation.

this is now becoming kind of worrying. a formal semantics can be matched to the behavior from a compiler and our friends in the compiler and in our CPU architecture manuals can describe whether it matches the "formalization"

Such a community can perform very useful error-checking.

how does anyone write this stuff

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

In addition, if used as the basis for software tools that do not necessarily require a deep understanding of its details, a formal semantics may come to be accepted as correct simply because of what it has made possible in the pragmatic domain.

this is FUCKED! a formal semantics is not something you can bully people into accepting. jfc

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

A denotational semantics defines an appropriate mathematical space as a model for a language, and maps the language’s syntax into that space in a way that is compositional. This property requires that the semantics of a syntactic phrase be a function of the semantics of the phrase’s syntactic sub-components.

so "denotation semantics" is a made up interpretation that conforms to some fuckboy's idea of aesthetically pleasing. see i'm learning so much

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

it keeps going. now he's claiming to be the first to have invented the C abstract machine (operational semantics)

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

OMG

It is central to our thesis that the semantics of C is so complicated that it can only be usefully manipulated in the context of a theorem prover.

THE C STANDARD IS WRITTEN BY HUMANS? FOR HUMANS?

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

this is also pretty worrying because he dismissed earlier ever conforming with the C standard, and seL4 literally just asserts that its C code conforms to the model

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

like this was not some newfangled thing people started doing recently! people writing code that needs to validate nontrivial properties generally do it by actually getting their hands dirty and doing the work to link the compiler's internal semantics to the representation made in HOL or whatever

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Without mechanical support, reasoning with a big semantics is error-prone, and it can be hard to be confident that one’s proofs are actually correct.

does he..........how does he think c compilers work

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Using a theorem prover means that we are confident that all of the results we have proved are correct.

"correct"

Having used the theorem prover HOL [GM93], we are particularly confident, as this system, following the example of its ancestor system LCF [GMW79], uses the strong type system of ML to guarantee that values of type theorem are only produced in ways that are logically sound.

that's it. that's your persuasive essay???

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

he keeps mentioning like "yeah these theorems take a lot of effort to prove.......and often they're completely unusable too" like sir have you considered that things being difficult might indicate that you need to find a semantics engine that doesn't hate you

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

cambridge still batting 100% on being actively evil people who just write whatever they want on official letterhead

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

https://trustworthy.systems/publications/papers/Tuch%3Aphd.pdf

this one is hosted on the seL4 site

Systems impose on languages many abstraction breaking requirements

"systems" lmao

and are not usually considered amenable to implementation in higher-level languages like Java and ML.

yeah cause the JVM abstract machine is specifically built to be this fucked up carnival ride. you could do it if you specifically forked the JVM. hate this lack of precision from ppl who are so loud about "formalism"

For example, zero-copy I/O and address translation are crucial features

zero-copy IO and address translation are extremely different things. zero-copy IO doesn't even make sense in ring 0 and is not in fact a "crucial feature". it's not even a language feature!

and programmers demand the freedom to control data structure layout [87],

you can "control data structure layout" in any language that lets you address bytes which i think is literally all of them. C struct layout is actually rly annoying because you can't let the compiler help you at all

in particular when optimising the cache and TLB footprint that is typically opaque in such languages.

those aren't your data structures those are the CPU's and that's ring 0 again, not a language feature

Inside the research community there are recent promising efforts at harnessing the gains of the last three decades of programming language research [8, 22, 29, 37, 46, 68, 89],

guy who knows nothing about anything he just said: "i represent the 'research community' and we will exterminate your kind"

with an emphasis on types and static checking, when implementing systems.

this guy grew into the rust evangelism strike force

However, these advances are yet to be popularised in industry

guy who thinks "systems" are an industry-specific thing

and still face enormous scepticism from systems implementors who are highly obsessed
with efficiency, sometimes to the extreme where clock cycles are the metric of choice.

this fucking guy!!!!! clock cycles can actually be counted reliably lmao. THIS is what seL4 is standing behind

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Even today, it is easy to violate the C type system by its cast mechanism and through address arithmetic.

guy who thinks C's type system is being violated through casting and address arithmetic. you know those have concrete semantics right

The programmer is given, intentionally, access to low-level bit and byte representations of values in memory.

again, that's literally every language

There are no checks on array bounds when indexing — this would violate C’s design philosophy.

the guy who is telling you with a straight face that he totally formalized C semantics for high-assurance ring 0 scenarios is now telling you he finds the language detestable

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

god it would be so cool if rust gave a shit about correctness

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

C does not have garbage collection and the programmer is responsible for allocation and deallocation of memory through library calls.

"library calls" why would you declare that you don't know the semantics at all

A systems implementor may even develop his or her own memory allocator that replaces this already low-level interface, enabling direct management of the physical memory in a system.

THIS IS THE GUY WHO IS CLAIMING HE KNOWS WHAT SEMANTICS ARE!

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Unfortunately, systems code is by no means strictly conforming and we could say by definition requires the ability to violate the standard’s strict rules on how memory can be accessed.

i am literally going to go find the C standard right now because the model of globally addressable memory space is i'm pretty sure the one thing that's not violated

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

like personally i think someone (not this guy) could make a pretty effective case for having correctly represented the semantics of C in ring 0 in a theorem prover even if they didn't link it to precise lines of C code through a model in the compiler,,,,

but if i was ever gonna say anything like "high-assurance" or "secure" i would actually do the work to link my semantic model to the one in the compiler and the CPU/RAM. and i would bully c standards people into accepting it

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

As a result, when describing type safety with respect to a C program in this thesis, we refer to a looser notion,

bruh. don't say things like that

where we may require expressions that designate a memory object to have a type corresponding to the expected value stored in memory.

he should have said "type" to clarify that that was gonna be the subject of debate. but this guy represents the "research community" so i bet he thinks his type is Correct

Program fragments can be type-safe if all their expressions have this property and later we formalise what is meant by the expected value’s type.

"type-safe". usually in cryptography we don't invoke generic informal terminology when we want people to take us seriously

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Memory management code tracks the free memory that can be allocated and also sometimes the memory that has been allocated.

he just keeps going??????? here i'll translate:

"the free memory that can be allocated": sometimes non-micro kernels like linux maintain free lists of unmapped physical pages so that moving the sbrk can be made very fast if not completely atomic
"and also sometimes the memory that has been allocated": i suspect this is referring to a process's virtual address mapping, but maybe it's referring to an in-kernel allocator

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

This is commonly done through pointer-linked data structures,

why are we still saying "pointer" when we're in ring 0???? that's a physical address buddy

and this use of what are also called mutable inductively-defined data structures

no citation here is so disrespectful lmao

is the cause of a great degree of the difficulty in reasoning about such code formally.

i'm sorry you're having difficulty maybe it's time to give it up???

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

This difficulty, a direct consequence of the use of indirection,

how are you still negging the reader like this

can be broken down as the aliasing [14] and frame [61] problems.

oh my GOD!!!!! ok so these fucking citations my god

[14] this is literally about virtual memory conforming to the C standard https://eis.mdx.ac.uk/staffpages/r_bornat/papers/MPC2000.pdf

The final difficulty is the complexity of the proofs: not only do we have to reason formally about sets, sequences, graphs and trees, we
have to make sure that the locality of assignment operations is reflected in the treatment of assertions about the heap.

EVEN THAT PAPER'S AUTHOR IS TELLING HIM TO DO HIS FUCKING JOB LOL

For all of these reasons, Hoare logic isn’t widely used to verify pointer programs. Yet most low-level and all object-oriented programs use heap pointers freely. If we wish to prove properties of the kind of programs that actually get written and used, we shall have to deal with pointer programs on a regular basis.

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

literally nothing will prepare you for [61]

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

[61] McCarthy and P. Hayes. Some philosophical problems from the
standpoint of artificial intelligence. In D. Michie and B. Meltzer, editors,
Machine Intelligence 4, pages 463–502. Edinburgh University Press,
1969.

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

only possible alternative is he mistyped the reference address making a crucial point in his own phd thesis

[62] F. Mehta and T. Nipkow. Proving pointer programs in higher-order
logic. Information and Computation, 199(1-2):200–227, 2005.

and yes, it still assumes the heap. even though if you're managing physical memory. you do not have a heap

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

god fuck and even this example is literally impossible

For an example of aliasing, consider a program with two pointer variables int * p and int * q and the following triple:
{| True |} ∗p = 37 ; ∗q = 42 ; {| ∗p = ? |}

not only has he just said "triple" without a citation like that's a well-known thing, this is the problem with it:

We are unable to ascertain the value pointed to by p as it may refer to the same location as q.

so you're telling me this C code:

int * p;
int * q;
*p = 37;
*q = 42;

demonstrates a classic aliasing problem....................does this guy even know about restrict or the concept of Undefined Behavior

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

turns out a "hoare triple" is the fuckboy term for "precondition and postcondition" when you think you're the first to ever create an abstract machine for the C programming language

he never cites anything about a Hoare triple either

Hoare triples, where a block of code is preceded by a pre-condition and followed by a post-condition, have already appeared in §1.1.1.

he did just refer you, with a hyperlinked section heading, to the line above where he just says "triple"

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

The aliasing problem is much worse for inductively-defined data structures, where it is possible that structural invariants can be violated, and where we need more sophisticated recursive predicates to stipulate aliasing conditions.

this problem is so bad he has no example

These predicates appear in specifications, invariants and proofs, and their discovery is often a time consuming trial-and-error process.

sorry that you hate your job?

The aliasing situation becomes untenable when code is type-unsafe and we are forced to seek improved methods.

if the C compiler can unambiguously interpret it then maybe the C compiler is a better proof framework than isabelle/HOL

If instead we had a variable float * p:
{| True |} ∗p = 3 .14 ; ∗q = 42 ; {| ∗p = ? |}
then not only do we have to consider aliasing between pointers of different types, but also the potential for p to be pointing inside the encoding of ∗q and vice versa.

i don't know what language you're verifying but C has semantics here

We talk about this phenomenon as inter-type aliasing.

who's we

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

While specifications may mention some state that is affected by the intended behaviour of a program, it is hard to capture the state that is not changed.

literally a nonsense sentence

In the above example, a client verification that also dereferences a pointer r, not mentioned in the specification, has no information on its value after execution of the code fragment.

meaningless

This limits reusability and hence scalability of verifications.

the C compiler is a better proof framework than isabelle/HOL

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

now he's explaining why he trusts machine output more than anyone

Formal reasoning with the usual process of pen-and-paper mathematical proof neither scales as we would like in software verification nor has the expected degree of rigour.

you know humans made isabelle/HOL too right

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

"scales" i think he literally just means deskilling here. he's literally saying actually formal verification is more reliable than engineers you pay money to

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

now he's trying to mansplain "algorithmic verification" as if model checking in TLA+ https://lamport.azurewebsites.net/tla/high-level-view.html isn't actually extremely widely used in this exact space already

Research in this area has produced impressive results in recent years with improvements in the underlying theory and increased available computing power.

theory and computing power are the only things that produce impressive results. no i can't give you a citation it's not like this is my dissertation

They can catch increasingly wider classes of programmer errors

this is still a fucking hateful way to talk about formal verification

and even guarantee the absence of certain types of bugs.

such a half-assed phd thesis. do you even believe in what you're saying

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Proof creation has a high cost associated with it.

yeah you keep saying it sucks. aren't you supposed to fix things like that

To give an idea, it has been the author’s experience [93, 94, 96] that, in an interactive theorem prover, verifying the functional correctness of C code can require between one and two orders of magnitude more proof steps than line count,

(a) yeah maybe cause you keep trying to verify functional correctness guarantees about heap allocations in ring 0
(b) "line count" appeals to the VC class. it doesn't work on programmers

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

literally everyone go read aaron turon's paper on weak atomic memory orderings right now https://plv.mpi-sws.org/gps/paper.pdf yes there's a coq proof but that's not what a paper is for!!!

check out this future work section:

However, the C11 model allows programmers to freely mix memory orderings, and ideally program logics should support such mixed reasoning as well.

literally it's that easy to give a shit about making people safe and providing powerful robust guarantees. this is why rust used to be good

Early investigation suggests that the C11 model has some corner cases when mixing memory orderings that may obstruct compositional reasoning principles.

i get nerd sniped every time i read that line lmao

View (PDF)

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

We believe the extra generality of GPS is important because it enables us to verify a wider class of weak memory programs, including those whose observable behavior is not SC. The circular buffer and Michael-Scott queue are good examples of this (see the appendix [1]). Singh et al. [35] argue that one should not expose the high-level programmer to such non-SC data structures, but GPS shows that in fact it is possible to reason sensibly and modularly about them.

keep in mind seL4 doesn't even represent concurrent behavior at all in their 2009 proof (as far as i can tell), even though concurrency semantics are a feature on any system with a motherboard. and aaron turon shows actually we can make it easier to write complex code with formal verification

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

honestly i should totally mess around with coq semantics for my ring buffer from hell

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

oh yeah the other guy was telling me how much it sucks to verify stuff

and in a single person year, we may be limited to verifying no more than something in the order of 1000 lines-of-code (LoC). Little data exists for how proof-based projects scale, but it is unlikely to be linear.

still just ridiculous that this guy is still talking "verification" without doing the work in the c compiler. i know people do that

The economics of verification have two significant consequences.

so bleak to talk about your research focus like this

First, the range of systems we can hope to verify is limited, but is still large enough to be practically interesting.

that's literally not a "consequence" why would you invoke proof jargon incorrectly lmao

Modern microkernels, with implementations around 10,000 LoC are hopefully within the realm of possibility.

you were just a moment ago saying int * p; int * q; was beyond your abilities

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

Verification of such systems can bring significant improvements to the reliability of the entire software stack, as above the microkernel layer hardware protection domains limit the impact any incorrectly behaving software has on the trusted computing base [83].

microkernels don't consider it their problem to provide any sort of correctness guarantees except for their own behavior, so this is just a lie.
the MMU isolation is from the CPU, not the microkernel

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

i will literally die mad about how casually they mentioned fucking shared memory pages are a replacement for sequenced writes https://trustworthy.systems/publications/nicta_full_text/8988.pdf

In original L4, “long” messages could specify multiple buffers in a single IPC invocation to amortise the hardware mode- and context-switch costs.

a single crumb of structured I/O

While long IPC provides functionality that cannot be emulated

literally the actual criterion for minimality

Shared buffers can avoid any explicit copying be-
tween address spaces

"microkernel layer hardware protection domains" cmon

View (PDF)

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

The result was significant kernel complexity, with many tricky corner cases that risked bugs in the implementation.

i thought that's why we used formal verification? that's why microkernels were worth the cost of proofs?

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

after reading all this my impression continues to be that microkernels don't do enough isolation at all!!! i even dug up build systems a la carte https://www.microsoft.com/en-us/research/wp-content/uploads/2018/03/build-systems-final.pdf where simon peyton-jones tried to pull this same shit about build systems

View (PDF)

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

i do actually appreciate that seL4 has a lot of use in single-core embedded applications where you're typically not just greedy for i/o like me and the purpose of an OS actually aligns reasonably well with the atomic i/o APIs

d@nny disc@

@hipsterelectron@circumstances.run 2 months ago

For seL4 there are even stronger reasons for staying away from supporting long messages: The formal verification approach explicitly avoided any concurrency in the kernel [Klein et al. 2009], and nested exceptions introduce a degree of concurrency.

i also very specifically want to avoid introducing subtle concurrency bugs but i'm doing that by expanding "isolation" beyond the MMU and expanding named "synchronization contexts" to structure literally all the externally-visible state changes like i/o

i absolutely don't think i could do seL4 better, and i'm not planning to inject tons of confusing and poorly-documented semantics like linux