having a lot of strong responses in my head to reading the gentle introduction to regular expressions in the python docs https://docs.python.org/3/howto/regex.html#regex-howto
having a lot of strong responses in my head to reading the gentle introduction to regular expressions in the python docs https://docs.python.org/3/howto/regex.html#regex-howto
not like getting upset
it's just like. ok. i remember SNOBOL. i remember fortran referencing line numbers. providing string inputs to the regex compiler feels like that
tiny, highly specialized programming language
HUGE props for calling it a programming language. great start
as well as "embedded inside python" which is another important point
You can also use REs to modify a string or to split it apart in various ways.
regex actually cannot do this. like the thing being called an RE or "regular expression" here cannot express modifications to a string nor even separation. this is not pedantry
i do like the immediate discussion of the matching engine introducing implicit semantics. great great way to introduce students to complex topics while maintaining a consistent focus at first
The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions.
in fact regex itself can only perform matching. matching html with regex is not very far from performing the complex stateful substitutions that are commonplace with classical regex APIs
referring to the alternation operator |
as a zero-width assertion is very novel to me. it's especially mentioned in the context of its excessively low precedence
so much of this document is specifically describing the particular regex language accepted by the python stdlib re
module and not the concepts of a pattern language which i think is a travesty
like non-capturing and named groups are introduced in terms of perl and which metacharacters they had available https://docs.python.org/3/howto/regex.html#non-capturing-and-named-groups i do not think inside jokes about metacharacters are helpful for people new to pattern matching
Python supports several of Perl’s extensions and adds an extension syntax to Perl’s extension syntax. If the first character after the question mark is a P, you know that it’s an extension that’s specific to Python.
this is useful info that i was not aware of and is also illustrative of how innovation works in the regex ecosystem
one thing i definitely like about python matches is that it always provides a value for every group to each match even if the value is None
. this is one benefit of the explicit match objects i researched introducing to elisp last year
loled at this example though
InternalDate = re.compile(r'INTERNALDATE "'
r'(?P[ 123][0-9])-(?P[A-Z][a-z][a-z])-'
r'(?P[0-9][0-9][0-9][0-9])'
r' (?P[0-9][0-9]):(?P[0-9][0-9]):(?P[0-9][0-9])'
r' (?P[-+])(?P[0-9][0-9])(?P[0-9][0-9])'
r'"')
It’s obviously much easier to retrieve m.group('zonem'), instead of having to remember to retrieve group 9.
sir i don't know about you but that is line noise to me and my dyslexia agrees
Strings have several methods for performing operations with fixed strings and they’re usually much faster, because the implementation is a single small C loop that’s been optimized for the purpose, instead of the large, more generalized regular expression engine.
emacs doesn't try very hard but it does try to identify literal patterns and delegates to faster implementations. the mental framework this teaches students is to use things because they're faster, not because they're more explicit and easier to maintain. maybe the author believes that's the only thing people will listen to but i think it's the wrong approach for an introduction to regular expressions
the response from emacs-devel was resoundingly that a lisp regex implementation would be leagues more useful than another native code impl
booooo it use html as an example but it was just an excuse to tell people to use an xml parser instead. at least link to an html parser project so they can see the horrors of html parsing firsthand instead of being told it's too dangerous
ok it ends telling the reader to check out a book from the library which is vaguely subversive and i appreciate
the sentence "A negative lookahead cuts through all this confusion" is difficult to take seriously though
A space for Bonfire maintainers and contributors to communicate