@SRAZKVT you can use my fractal zip format for this purpose actually (one of the design goals was representing a dataset with changes over time). that's actually a really fascinating corollary of using it to represent atomic transitions between reproducible filesystem states....................i did not realize that VCS is precisely a formulation of atomic filesystem transactions. need to think about this further
this is a subset of "P vs NP". examples of computational hardness assumptions used for asymmetric crypto:
- prime factorization: used for RSA, the first full cryptosystem featuring public and private keys, or a "keypair".
- unfortunately, this tends to require much larger keys than more modern techniques. and the company RSA was the one corporation pushing DUAL_EC_DRBG, the NSA crypto backdoor
i prefer
- discrete log: this is what diffie-hellman key exchange uses, and it's so simple it feels like a magic trick.
- a version of this is the basis for elliptic curve cryptography, which is everyone's favorite because the keys can be much much shorter (256 bits provides equivalent-or-better strength than 4096 bits for RSA).
curve25519 is what i recommend using for asymmetric crypto in general. it's a standard and there are no parameters to choose between
merkle referred to these computational tricks as "trapdoor" problems, because they can be efficiently computed in one direction, but not the other way.
you might recall this mechanic being similar to the goal of a hash function--however, hash functions (checksums) aren't counting upon the non-invertibility in the same way (recall information theoretic vs complexity theoretic hardness). non-invertibility of a hash function is described as pre-image resistance, and it's one of multiple qualities for a hash function to satisfy.
computational hardness assumptions for asymmetric crypto are theoretically susceptible to the mythical quantum computer. but physicists have never conclusively proven that the mathematical model they made up for quantum computing is actually how quantum physics works.
that's the thing about math: it's not real, and you can just make shit up and have a great time doing math. but cryptography concerns itself with applied math, i.e. math implemented using physics. and while elliptic curves and finite-field arithmetic can be implemented in physics with semiconductors, quantum computing has never been demonstrated to exist in any form.
this is why i don't like physicists.
but! there are some cool math tricks you can play around with that were inspired by the mythical quantum computer! and these generally involve lattice theory.
lattice techniques for asymmetric crypto are mostly still very new and experimental, and haven't yet reached the maturity of elliptic curve techniques. in particular, they tend to be incredibly slow, with massive key sizes. this reduces security!
the most practical lattice approach atm is kyber. and it's pretty neat!
one interesting way you can play around with lattice methods is to perform a key agreement process that merges the security of ECC (elliptic curve crypto) and lattices. signal did this for a while: https://signal.org/blog/pqxdh/
unfortunately, signal has more recently taken on a new cryptographer who keeps saying really odd shit. and he decided to move away from double ratchet and go all-in on lattice methods: https://eprint.iacr.org/2025/078.pdf.
if you scroll to page fucking SIXTY you'll see him admit that moving away from the elliptic curve crypto of the double ratchet protocol to FULL QUANTUM actually lost protection against adversarial randomness, in order to be safe from the quantum boogeyman
in fact, double ratchet was actually the first protocol to demonstrate resistance against adversarial randomness. you should read this 2020 paper on ratcheting!
in fact, this paper actually introduced adversarial randomness for the first time, discovering it as a specific property of the double ratchet cryptosystem.
isn't it cool how we can discover new theory from applied math? i love that type of shit
anyway, the reason this matters is because DUAL_EC_DRBG was a previous attempt by the NSA to break randomness generators. and since the OS mediates access to your hardware, the OS also mediates randomness.
so, in short, you also have to trust your OS to generate random bits correctly, although some specific cryptographic techniques like double ratchet can be more resilient in specific ways.
and this brings us back to trusting the OS!
asymmetric cryptography also covers signatures. these can be defined in many distinct ways, because the kind of thing you want a signature to achieve may be domain-specific or situational. but in general, a signature is a way to use secret data (from asymmetric crypto) to authenticate the result of a separate computation.
one of the standard ways to do this is to use the secret data (private key) to encrypt the result of your computation, in a way that can be inverted with the corresponding public key. this constitutes a form of "cryptographic proof".
what does an asymmetric keypair "prove"?
keep in mind that diffie-hellman (DH) secrets are an example of asymmetric crypto, but don't introduce the standard "keypair" framework. DH secrets can also be used as a a form of "proof", but not the same way as a keypair!
this sounds pedantic, but it's important to be thoughtful about words like "proof" (which have human-level meanings), and whether/how the mathematical guarantees actually translate into human-meaningful assurances.
for example, if a private key is shared among a group of users, the math is no less powerful, but the human interpretation of the mathematical result is different in a specific way!
in particular, you can't distinguish between members of the group (less powerful guarantee), but members of the group also gain a degree of anonymity when using the shared key!
effective cryptographic engineering is therefore a deeply sociological pursuit. the mathematical guarantees are unambiguous, but their human interpretation is not!
so i'm now going to simplify all of this into the context of the linux kernel module signing. recall that the stated goal was "reproducibility". let's see how we can translate that human assurance into mathematical guarantees.
we'll narrow signature schemes to this use case:
- the computation we want to "prove" is representable by serializing the state of the kernel build system directory tree.
- a filesystem tree can be unambiguously serialized by walking in order from the root, iterating directory entries in byte-lexicographic order, then writing each entry's name, mode bits, and any file data (a symlink is represented with the target as "file data").
in fact, this ends up being very similar to how tar or zip archives work, because these were intended as generic serialization formats for filesystem trees!
making these unambiguous invokes a whole separate discussion around what a checksum ends up "proving". but i do not have time to get into tree hashing of zip archives at this time or hash collision attacks on .tar.zst files--for now, we assume:
- checksums don't collide,
- the filesystem can be globally write-locked while recursively iterating,
- we can "stop" and "restart" the kernel build process at arbitrary points