@SRAZKVT you can use my fractal zip format for this purpose actually (one of the design goals was representing a dataset with changes over time). that's actually a really fascinating corollary of using it to represent atomic transitions between reproducible filesystem states....................i did not realize that VCS is precisely a formulation of atomic filesystem transactions. need to think about this further
(in fact, these are all things the linux operating system kernel should be able to let us do with a filesystem tree, but that's yet another topic for a separate time)
so: we can take a filesystem tree like the kernel build directory, and we can convert that to a checksum. the "reproducible builds" organization (despite lack of OS-level support for atomicity) defines "reproducibility" in these terms:
- if the kernel build directory starts at checksum C_0,
- and after the build process, the kernel build directory matches the expected checksum C_1,
then the build process is "reproducible". it produces C_1 from C_0 (after executing some vaguely-defined process or processes). this is repeated if there is sufficient information available to produce C_1 from C_0
i say "if there is sufficient information", but the reproducible builds evangelism strike force team doesn't think that way. their modus operandi is to repeatedly neg maintainers to make extremely confusing and subtle modifications to their release process for all their users.
it would only be necessary to change the build process for all users if the reproducible builds squadron believed very very deeply that the maintainer's build output is the ground truth for everyone else to reproduce.
.........which brings us to the issue at hand for the kernel.
for a representative example of how the reproducible builds evangelism strike force approaches maintainers, consider this representative example, where an arch linux package maintainer posts to the bug report mailiing list for gnu automake https://lists.gnu.org/archive/html/bug-automake/2025-08/msg00000.html
In Arch Linux our automake package includes
/usr/share/doc/automake/amhello-1.0.tar.gz. When we rebuild this package using our rebuilder to check for reproduciblity the uid/gid and timestamps are not normalized
- the arch linux package build system orchestrates the build process,
- the arch linux automake package decides to include extraneous test data in the output,
- the arch linux 'reproducibility" checker does not automatically zero out fields that are known to induce non-matching checksums,
.........so arch linux files a "bug" against automake.
to remove all doubt, another arch linux maintainer follows up: https://lists.gnu.org/archive/html/bug-automake/2025-11/msg00007.html
You don't need to worry about the value, this variable is meant to be set externally. From the reproducible-builds.org documentation, this is suggested for shell scripts on GNU systems:
(note the username kpcyrd here. he'll be coming up again soon.)
this is a very specific set of build process requirements specific to the arch linux packaging system, which our friendly neighborhood distro maintainer is able to specify with precise detail.
and this is filed as a bug upstream, because the reproducible builds evangelism strike force requires "reproducibility" in the form of a code injection API to achieve a chosen-plaintext attack.
anyway, this guy's website is named "vulns" and his work is sponsored by google and the linux foundation https://vulns.xyz/2021/07/disagreeing-rebuilders/
[we will return to the kernel now. i promise this was necessary]
Disagreeing rebuilders and what that means - vulns.xyz
let's refresh our memory on the module signature problem statement: https://lwn.net/Articles/1012946/
This mechanism, which checks module integrity based on hashes computed at build time instead of using cryptographic signatures
this is where we can finally start to describe why that justification makes absolutely no fucking sense!
a signature is essentially just the result of encrypting a checksum with a private key, so anyone can decrypt with the corresponding public key to obtain the checksum.
it's true that there are other ways with more constraints, but the standard methods like EdDSA quite literally accept an arbitrary cryptographic hash function (checksum) as a parameter.
that checksum is in fact exactly the information we need for "reproducibility"! and we very much do want to ensure we have the exact same checksum as was generated by the upstream maintainer--that's what the public key verification achieves!
the cryptographic "proof" resulting from a private key-based checksum signature is very specifically the human assurance that the human owner of that private key must have generated the corresponding checksum!
if we accept that a cryptographic checksum is "proof" of reproducibility, then a signature scheme is strictly more powerful--proof of reproducible output, and proof that the output checksum was not modified after being generated by the holder of the private key!
let's take a look at that patch series now.
lwn complains about AI scrapers when you try to access their locally hosted copy of diffs from LKML. https://web.archive.org/web/20250409044448/https://lwn.net/ml/all/20250120-module-hashes-v2-2-ba1184e27b7f@weissschuh.net/
(why not just link to LKML if they're getting scraped so hard? if you ask questions like this you will not like the answers you find)
this diff [2/6 in the patchset] adds a new config option that disables the existing config option to enforce signature checking. real Kconfig heads will understand that this config dependency is equivalent to an override mechanism. so you can disable module signing even if the user config requires it.
that's not even the cryptographic part yet, just an extra build system backdoor. the cryptographic claims are next.
this is the magnum opus of the reproducible builds evangelism strike force: https://web.archive.org/web/20250408191140/https://lwn.net/ml/all/20250120-module-hashes-v2-6-ba1184e27b7f@weissschuh.net/. let's evaluate these claims:
The current signature-based module integrity checking has some drawbacks in combination with reproducible builds:
drawbacks in combination? that makes it sound like reproducible builds are a simple config setting. that would be nice, right? if reproducible builds had precise semantics? and they fucked off and stopped bothering everyone else?
Either the module signing key is generated at build time, which makes the build unreproducible,
we're going to examine this claim in more detail presently. but first we absolutely need to highlight the rest of this sentence:
or a static key is used, which precludes rebuilds by third parties and makes the whole build and packaging process much more complicated.
i cannot possibly express the violent feelings within me upon reading this statement:
- a "static key" refers to "literally a normal key, the way it worked before".
- does it "preclude rebuilds by third parties"? (we will evaluate this below.)
- "makes the whole build and packaging process much more complicated" -- again, this is literally the way it works right now.
so the reproducible builds evangelism strike force get to whine about how complicated it is to make the build reproducible. if you hate your job then maybe choose a different line of work?
but [6/6] in this patchset has so much more to show us! here is the the reproducible build squadron's best and brightest, making things less complicated:
diff --git a/Documentation/kbuild/reproducible-builds.rst
b/Documentation/kbuild/reproducible-builds.rst
index f2dcc39044e66ddd165646e0b51ccb0209aca7dd..6a742ad745113a9267223b33810dbc7218c47d4c 100644
--- a/Documentation/kbuild/reproducible-builds.rst
+++ b/Documentation/kbuild/reproducible-builds.rst
@@ -79,7 +79,10 @@ generate a different temporary key for each build, resulting in the
modules being unreproducible. However, including a signing key with
your source would presumably defeat the purpose of signing modules.
-One approach to this is to divide up the build process so that the
+Instead ``CONFIG_MODULE_HASHES`` can be used to embed a static list
+of valid modules to load.
+
+Another approach to this is to divide up the build process so that the
unreproducible parts can be treated as sources:
1. Generate a persistent signing key. Add the certificate for the key
so, instead of forcing our brave and noble reproducible builds advocates to suffer the cruel and unusual punishment of "splitting up the build process", we now have the much less complex alternative of "adding a backdoor in Kconfig that short-circuits signature checking at runtime"