my emacs is completely fucked
Post
this distinction is actually like the most interesting part of my directory traversal work: using file descriptors and inodes to confidently read and write and have a clear concept of progress and completion, instead of path strings which are fundamentally racy and cannot be deduplicated
so essentially the filesystem is where fucking everything happens all at once. the NASDAQ trading floor. people biking to work. kids walking to school in their outfits. it's new york, baby! it all happens here!
(1) interactions with many other entities they don't control
(2) with incredibly nondeterministic latency for every request
but most importantly, you're often writing code that runs on your users' laptops and touches their files. that's a different degree of closeness and responsibility that you have to bear on top of figuring out all these situations that don't have protocols to provide easy answers.
ignore crate in pants for almost two years before we amicably parted ways because pants needed more control over that traversal. this was supposed to be something i could contribute to @ethersync which like pants has to concern itself not just with a single filesystem crawl but also correctly maintaining state over time. this mutable state is where things get really dicey.
the literally fractal complexity that results from every directory potentially having an entirely new set of entries is genuinely just very difficult to solve.
contrast to a problem space that can be solved: https://github.com/zip-rs/zip2/pull/236. zip files tell you up front what contents they have and you can plan and schedule your work around this. this ability to schedule "offline" is what the filesystem gives you, and it's something i haven't been able to translate to the async executor model vs one with explicit threading.
ignore crate) or from the entire process. differentiating that from erroneous behavior is extremely difficult and i spent probably several days thinking about that PR before trying to solve it on my own.
pread() will succeed.so while path strings are a really tempting way to reduce the problem to an abstraction, they can be and often are a complete lie. just yesterday i started getting errors in my shell because i'd unlinked the directory i was in, and even though i added the directory back later (so my shell was still displaying a valid path), the filesystem does not take IOUs like that >=[
ignore crate (and ripgrep more generally) does that is clearly pessimal is not differentiating symlinks from other types of paths, so they traverse the same output twice. i suspect ripgrep keeps track of more persistent identifiers like inodes when it actually reads the files, but i think it's the wrong abstraction that leads to more than just deadlocking if you're relying on it for correctness like pants
and even if inodes were unique within a single filesystem, there is almost always more than one filesystem on each physical device (i can think of at least the EFI boot partition on my laptop), and those filesystems absolutely do not take steps to avoid reusing the same inode numbers, because that's for their internal bookkeeping (and it's what the OS requests of them).
really this is something the OS should be handling and it's extremely surprising that no one has done it. i was going to mention process id (pid) reuse as a similar example but i don't think there's really any good use for pid uniqueness, whereas with files and directories there is imho a very distinct concept of mutable state which is the whole point of the VFS layer!