@SRAZKVT you can use my fractal zip format for this purpose actually (one of the design goals was representing a dataset with changes over time). that's actually a really fascinating corollary of using it to represent atomic transitions between reproducible filesystem states....................i did not realize that VCS is precisely a formulation of atomic filesystem transactions. need to think about this further
but surely i'm mistaken. let's check lwn again:
If the signing keys are publicly available for use in recreating the build, malicious actors could also sign modified loadable modules with them.
that is indeed exactly what module-signing.rst warned us about!
If they aren't publicly available, the build can't be reproduced.
you see what this means right? "reproduced" doesn't mean "reproduced". "reproduced" means "malicious actors can also sign modified loadable modules".
why do i refer to CONFIG_MODULE_HASHES as a backdoor? because it overrides signature checking by coming before it:
diff --git a/kernel/module/main.c b/kernel/module/main.c
index effe1db02973d4f60ff6cbc0d3b5241a3576fa3e..094ace81d795711b56d12a2abc75ea35449c8300 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3218,6 +3218,12 @@ static int module_integrity_check(struct load_info *info, int flags)
{
int err = 0;
+ if (IS_ENABLED(CONFIG_MODULE_HASHES)) {
+ err = module_hash_check(info, flags);
+ if (!err)
+ return 0;
+ }
+
if (IS_ENABLED(CONFIG_MODULE_SIG))
err = module_sig_check(info, flags);
this is all trying to dance around two very subtle points that require a very specialized technical understanding of cryptography to infer:
- the signing keys are secret because downstream distro packagers and/or corporate sysadmins are the malicious actors which module signatures protect against!
- more importantly, cryptographic signatures are just unspoofable checksums!
the fact that they're not "reproducible" is because they use secret data (the private key) to stop "malicious actors" from generating new checksums for "modified loadable modules"!
claiming a cryptographic signature is "nonreproducible" is a non sequiter--they are literally just a list of module checksums. it's the exact same fucking thing, except there is an additional cryptographic proof that modules haven't been modified since they left the custody of the key owner.
lwn, kpcyrd, Thomas Weißschuh, and everyone associated with the module hashing for "reproducibility" is either completely unaware of how cryptography works (and therefore should not be trusted with crypto), or they are lying in order to backdoor linux users (and therefore should not be trusted with crypto)
so here's the answer:
- we build the kernel, then build the modules, then checksum the modules. this gets us checksums for the filesystem tree just before we introduce the secret data.
- we verify the module checksums correspond to the ones produced by kernel maintainers by decrypting the published signatures with their public key. this is a stronger form of build reproducibility!
[in fact, this alone should be sufficient, because the kernel build process should be able to delay module signing until the very end. but for completeness, let's walk through how tree hashing lets us swap out a specific intermediate change and verify the result is correct.]
so our problem now can be decomposed into three stages:
(a) the filesystem state of the kernel build tree right before generating signatures can be checksummed in any way.
(b) adding module signatures is represented as a (normalized) filesystem delta (git can generate this).
(c) the result of the kernel build process continues until completion. generate a normalized delta for the filesystem state change from (b) to (c) with git diff.
the key insight here: unless signatures are copied into more than one place, the delta from (b) to (c) should not depend upon any secret data. so, reproducibility is ensured by:
- matching the checksum of the filesystem state at point (a).
- matching the module checksums against the checksums decrypted against maintainer public keys from the upstream signatures.
- matching the checksum of the filesystem delta from (b) to (c)!
that actually still doesn't require any tree hashing either! but even if we absolutely cannot be assed to split the kernel build process into discrete a/b/c phases, or if the signature data from (b) influences the filesystem delta from (b) to (c) (e.g. if the signatures are copied into a text file), we can still make this shit 100000% reproducible, without any cryptography at all!
how? by simply erasing the signatures! take the upstream kernel build tree filesystem state, then replace any signature data with an equivalent length of zero bits (i.e. zero out the signatures). calculate the resulting checksum from upstream! do the same thing for downstream! YOUR CHECKSUMS WILL MATCH!
of course, this requires identifying the precise regions of data corresponding to signatures. but the kernel already knows this, because it has to read from those exact regions in order to validate module signatures upon load!
i believe the longer-term answer to "reproducible builds" involves OS-level support for filesystem checkpointing, per-process isolation of i/o state, and a deterministic ordering along with transactional semantics for propagating a series of i/o operations as an atomic filesystem delta.
which is to say: reproducible builds require reproducible process executions. and that requires per-process isolation of filesystem state.
@hipsterelectron plan9 has per process filesystems ig
@SRAZKVT keyKOS kinda does
@SRAZKVT omg ugh NOBODY ever tries the literal only thing i want for perf optimization https://doc.cat-v.org/plan_9/4th_edition/papers/fs/
The file system server processes prevent deadlock in the buffers by always locking parent and child directory entries in that order. Since the entire directory structure is a hierarchy, this makes the locking well-ordered, preventing deadlock. The major problem in the locking strategy is that locks are at a block level and there are many directory entries in a single block. There are unnecessary lock conflicts in the directory blocks. When one of these directory blocks is tied up accessing the very slow WORM, then all I/O to dozens of unrelated directories is blocked.
@SRAZKVT literally i'm so upset bc:
- making my writes visible to other processes should absolutely happen in an atomic transaction
- persisting my writes to disk is (1) a completely different fucking thing than IPC (2) should also happen atomically
@SRAZKVT literally nobody has ever asked filesystems to act like a lock-free OS-global hash table. that's a ConcurrentHashMap that's not a "filesystem"
@hipsterelectron well there's a reason why ska, navi, mercurial, git, and i all use it as a hashmap
@SRAZKVT do git/ska/navi/hg/you map pages with DIRECT_IO for that?
@SRAZKVT thanks for identifying this, that's definitely worth supporting (direct block writes) but i feel like that would make more sense to expose as a completely separate resource from the standard filesystem tree.
hmmmmmm actually, i'm not sure about that! if i want transactional semantics across file paths, i also want transactional semantics within a single file path. the pattern of explicit resource request => blocking commit syscall to establish ordered transaction boundaries should be able to apply to changes within a single file too.
and if my goal is to synchronize changes to disk as a transaction (as opposed to just IPC propagation), then i should have to specify that when i request a sync context to expose to a process (e.g. within a subprocess spawn call). cc @miss_rodent