my emacs is completely fucked
Post
the docs just called me out for my use of the term "path" right as i said that above!!!! bc the stuff emacs works on is largely concrete files as opposed to search paths (and in fact i totally agree with their distinction here)
this distinction is actually like the most interesting part of my directory traversal work: using file descriptors and inodes to confidently read and write and have a clear concept of progress and completion, instead of path strings which are fundamentally racy and cannot be deduplicated
so essentially the filesystem is where fucking everything happens all at once. the NASDAQ trading floor. people biking to work. kids walking to school in their outfits. it's new york, baby! it all happens here!
(1) interactions with many other entities they don't control
(2) with incredibly nondeterministic latency for every request
but most importantly, you're often writing code that runs on your users' laptops and touches their files. that's a different degree of closeness and responsibility that you have to bear on top of figuring out all these situations that don't have protocols to provide easy answers.
ignore crate in pants for almost two years before we amicably parted ways because pants needed more control over that traversal. this was supposed to be something i could contribute to @ethersync which like pants has to concern itself not just with a single filesystem crawl but also correctly maintaining state over time. this mutable state is where things get really dicey.
the literally fractal complexity that results from every directory potentially having an entirely new set of entries is genuinely just very difficult to solve.
contrast to a problem space that can be solved: https://github.com/zip-rs/zip2/pull/236. zip files tell you up front what contents they have and you can plan and schedule your work around this. this ability to schedule "offline" is what the filesystem gives you, and it's something i haven't been able to translate to the async executor model vs one with explicit threading.
ignore crate) or from the entire process. differentiating that from erroneous behavior is extremely difficult and i spent probably several days thinking about that PR before trying to solve it on my own.
pread() will succeed.so while path strings are a really tempting way to reduce the problem to an abstraction, they can be and often are a complete lie. just yesterday i started getting errors in my shell because i'd unlinked the directory i was in, and even though i added the directory back later (so my shell was still displaying a valid path), the filesystem does not take IOUs like that >=[
ignore crate (and ripgrep more generally) does that is clearly pessimal is not differentiating symlinks from other types of paths, so they traverse the same output twice. i suspect ripgrep keeps track of more persistent identifiers like inodes when it actually reads the files, but i think it's the wrong abstraction that leads to more than just deadlocking if you're relying on it for correctness like pants
ignore.so this should still be plausible on windows too, but POSIX literally just standardized last year the posix_getdents() libc function call https://pubs.opengroup.org/onlinepubs/9799919799/functions/posix_getdents.html. and this is really fucking sick for a variety of reasons but mostly it codifies what everyone already agreed on and that is POSIX's job
readdir() interface (one at a time) in terms ofgetdents() now. so it's not doing anything tricky and it's well aligned with the hardware.but the really "rusty" thing i was able to get from this was that instead of mixing all your directory workers together like ripgrep's thread pool (andrew gallant didn't actually write the deadlocking code there), you can explicitly assign workers to own specific memory regions which they employ for their own tasks (directory handles, file handles, and symlink handling). and i used Pin<...> a good amount which is a very confusing but very thoughtful interface that is also what rust uses for async primitives to ensure safety across threads etc
this is the path library that actually supports a robust binary representation and that the stdlib should have had in the first place https://codeberg.org/cosmicexplorer/deep-link/src/branch/main/d-major/pathing/src/lib.rs did you know std::path::Path objects with different string values will compare as equal? i think that's thoroughly unacceptable along with the way (as with this problem in general) you can't parse the Path into a representation that maintains the kind of performance or safety guarantees you want. you just have to keep asking it questions which is a lot of extra busywork at best.
io::Result<...> to wrap an underlying syscall to the OS. i agree that the logic makes sense (and python's wonderful pathlib has this exact same Path.exists() method), but in fact the filesystem is partially an API and protocol problem that i think we can solve.and i consider the Path::exists() in the rust stdlib to be an abdication of what rust can help us all to solve collectively, so i'm very hopeful i can file an actual RFC later this year (still need help with windows though)