So which algorithms for optimizing its NFA (Nondeterministic Finite Automaton) grammars (to fit in the cache hierarchy) would suit a computer that can perform concurrent graph traversals?
First off: Garbage-collecting dead states would be cheap!
As would "determinizing" an NFA, that is merging states reached from the same edges so the machine needs.
Building upon that Brzozowski's Algorithm suggests determinizing an NFA & its inverse then un-inverting it to optimize a NFA.
2/3!