this is actually quite fascinating https://github.com/pantsbuild/pants/blob/71d93e3563af17a01f22c123cb86058714474baa/src/python/pants/util/fileutil.py#L40
build systems tend to have such overwhelmingly powerful i/o orchestration capabilities that continue to remain lying in plain sight. part of this is by evaluating input not just by e.g. size in bytes, but differentially appreciating certain parts of the build are much more important for certain business processes. this is how you get 30x speedups
def create_size_estimators():
"""Create a dict of name to a function that returns an estimated size for a given target.
The estimated size is used to build the largest targets first (subject to dependency constraints).
Choose 'random' to choose random sizes for each target, which may be useful for distributed
builds.
:returns: Dict of a name to a function that returns an estimated size.
"""
def line_count(filename):
estimation and testing is a really key skill here