1+ more replies (not shown)
~83 more replies (not shown)

The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions.

in fact regex itself can only perform matching. matching html with regex is not very far from performing the complex stateful substitutions that are commonplace with classical regex APIs

Python supports several of Perl’s extensions and adds an extension syntax to Perl’s extension syntax. If the first character after the question mark is a P, you know that it’s an extension that’s specific to Python.

this is useful info that i was not aware of and is also illustrative of how innovation works in the regex ecosystem

loled at this example though

InternalDate = re.compile(r'INTERNALDATE "'
r'(?P[ 123][0-9])-(?P[A-Z][a-z][a-z])-'
r'(?P[0-9][0-9][0-9][0-9])'
r' (?P[0-9][0-9]):(?P[0-9][0-9]):(?P[0-9][0-9])'
r' (?P[-+])(?P[0-9][0-9])(?P[0-9][0-9])'
r'"')

It’s obviously much easier to retrieve m.group('zonem'), instead of having to remember to retrieve group 9.

sir i don't know about you but that is line noise to me and my dyslexia agrees

https://docs.python.org/3/howto/regex.html#use-string-methods

Strings have several methods for performing operations with fixed strings and they’re usually much faster, because the implementation is a single small C loop that’s been optimized for the purpose, instead of the large, more generalized regular expression engine.

emacs doesn't try very hard but it does try to identify literal patterns and delegates to faster implementations. the mental framework this teaches students is to use things because they're faster, not because they're more explicit and easier to maintain. maybe the author believes that's the only thing people will listen to but i think it's the wrong approach for an introduction to regular expressions

i also think saying "the whole pattern will fail" is misleading since i believe it's not "the whole pattern" that fails but rather just the attempted left-to-right matching process that would otherwise have continued rightwards but may yet continue as a result of alternations, optionals, or some other such construction. but i'm not sure since i've never used a negative lookahead before and generally consider such constructions to be a last resort and much less readable than conditional logic applied surrounding the pattern string