I spent some time the other day thinking about zippers and structural editors and I wanted to share some of the ideas, because they were interesting, so this is going to be a thread about that
Discussion
the observation to make is that if we view a text insertion zipper as a normal tree overtype zipper on a string, rather than a true insertion zipper, we can analyze insertion and overtype as follows:
a string-with-insertion-cursor, such as
fo|o
corresponds to the tree zipper
"o" @ "fo_"
where the LHS of @ is the thing in focus, the RHS is the one-hold context (with an _ representing the hole)
the act of typing a character, such as 't' should produce the string-with-insertion-cursor
fot|o
which is
"o" @ "fot_"
which is equivalent to updating the zipper to
"to" @ "fo_"
and then navigating to the next recursive tree position
but when we view these tree contexts through McBride's tube view, where a one hole context is a stack of frames of local subtrees, we get this:
fo|o
is the tube-y zipper
"o" @ [ 'o':_, 'f':_ ]
and
fot|o
is
"o" @ [ 't':_, 'o':_, 'f':_ ]
from this perspective, we can describe the insertion operation as follows:
1. we have a whole subtree "o" to operate on
2. we want to insert a single character 't'
3. we find the unique list/string constructor that has a single char arg and a single list arg: cons (`:`)
4. we partially apply cons (`:`) to the single char 't', and form a one hole zipper 't':_
5. we push that onto the tube-y context stack
now, we could view key presses as complex commands, which specifically mean that precise thing. and that's fine. but i want to generalize this and think about something a bit more eh.. rich?
i phrased the above sequence of operations the way i did because i want to consider the possibility that our tree structure isn't as simple as lists
instead, lets imagine that its some arbitrary tree type that has many constructors some of them with char args
for instance, i dunno, something like
data Tree = Z | O Char Tree | T Char Tree Tree
now when we press a single key, we look at step 3 and say, well, there is a unique constructor `O` that takes a char and a recursive tree, yes, indeed, so we can push `O 't' _` to the stack
but there's ALSO a constructor `T` that takes two recursive trees
we obviously can't use that right? i mean, which subtree gets the hole? and what do we put in for the other subtree?
well if we have a way of representing an as of yet un-filled portion of a tree (also called a "hole" but this type a different sort, so i'll just call it a "gap" for clarity), we can say this:
we can also add either of
T 'f' _ ?
and
T 'f' ? _
to the stack as well
here i'm using ? to represent gaps that must be filled in eventually
so here we have a situation where a single key press can reasonably suggest edits that "insert" three different kinds of tree nodes:
an O node, that's like a string insertion
a T node with a gap on the right
a T node with a gap on the left
from a UI perspective, we have an ambiguity here, in terms of what is intended by the user with the single keypress
this could be resolved with different modes selected prior to the keypress, or by offering up the different options with one selected as a default
the idea of using gaps is not all that peculiar, fwiw
dependently typed programming languages have them (called holes, sheds, and typed holes depending on language)
so do some text editors, for instance in the form of TextMate's snippet tab stops
but i want to go even further afield and consider the idea that we might not treat the keypresses as characters-as-data instead, but rather characters-as-pretty-printing
in the setting of a string, or some string-like object (some tree with only char-like or string-like data inside of it), there's no real interesting thing here i guess
but in an abstract syntax tree for a little language, there's something more to be said
not all characters are created equal in a concrete syntax
the character 'x' might start an identifier like `x_position`, while the character '<' might start a pair like `<1,2>`
so when the user presses a key, it could have different roles depending on the concrete syntax
if we magically could know how a piece of an AST would pretty print, we might be able to determine this computationally, even
if we knew that `Pair m n` pretty prints as "<" ++ pretty m ++ "," ++ pretty n ++ ">"
we could say, aha, the user pressed `<` so a good candidate for what to push to the context is `Pair _ ?` or `Pair ? _`
practically speaking, this would look something like so:
suppose you have the program
\p -> fst p
and we have an insert cursor that's "before" `fst p`, which i'll indicate with surrounding brackets:
\p -> [fst p]
we type < and we might get
\p -> <[fst p], ?>
probably the way we get the magic of knowing the pretty printing is we have concrete grammar and we treat this as related to Earley parsing, which has a similar kind of structure
except Earley parsing will just try things to output a parse tree, whereas this requires that we have some kind of partial/localized parsing, where part of the parse agenda is already determined because of the context
it might actually be more precisely seen as left-corner parsing rather than general Earley parsing, but maybe that's too left biased, and instead we need a more general non-directional chart parse or some other kind of thing, so that you could type `>` and get a similar effect
i think this general view of things makes it somewhat easy to see how to approach structured editing, and how we can have an Insert cursor for trees. where the Overtype cursor would simply replace the current thing in focus, Insert would push a context frame to the stack
which frame we pick would depend on the type in question
how we deal with multiplicity of options, how we deal with underspecification of the tree, these are all UI/UX issues rather than programming issues
I know that there's a long history of work on structured editors, most of which i haven't read. some more recent work by @neurocy and others on Hazel's editor metatheory is quite interesting. i don't know how it relates to what i've described tho
here are some relevant Hazel links tho:
Toward Semantic Foundations for Program Editors
https://arxiv.org/pdf/1703.08694
Hazelnut: A Bidirectionally Typed Structure Editor Calculus
https://www.youtube.com/watch?v=99SRVnRA9Ec&list=PLechWD1Ts4l9oBGin5tXE7wM6KsF1AtvS&index=2
i think my general vibe on the Hazel approach is that it's too type-theoretic. and while I love type theory, I also love parsers, and I feel like the root of these issues is parsing-related rather than type theory related, even if type theory plays a role
in particular, i think there's a bunch of work on good parsing-with-error-recovery that has been done and needs to be done, especially from the perspective of making these things into generic tools rather than one off custom theories for specific languages
i also think we need a stronger focus on grammars as concrete objects of programming, rather than grammars as latent things
Parsec/parser combinators are cool but there's no concrete grammar to manipulate
TreeSitter is a terrible tool
there's a missing component
also, unrelated to structure editors, there's also a strong value to zippers in general UIs
in particular, when you have any kind of nested UI elements, and you have to do some kind of focus-related UI actions, for instance in a TUI you might have a focused UI element,..