1+ more replies (not shown)
~82 more replies (not shown)

The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions.

in fact regex itself can only perform matching. matching html with regex is not very far from performing the complex stateful substitutions that are commonplace with classical regex APIs

Python supports several of Perl’s extensions and adds an extension syntax to Perl’s extension syntax. If the first character after the question mark is a P, you know that it’s an extension that’s specific to Python.

this is useful info that i was not aware of and is also illustrative of how innovation works in the regex ecosystem

loled at this example though

InternalDate = re.compile(r'INTERNALDATE "'
r'(?P[ 123][0-9])-(?P[A-Z][a-z][a-z])-'
r'(?P[0-9][0-9][0-9][0-9])'
r' (?P[0-9][0-9]):(?P[0-9][0-9]):(?P[0-9][0-9])'
r' (?P[-+])(?P[0-9][0-9])(?P[0-9][0-9])'
r'"')

It’s obviously much easier to retrieve m.group('zonem'), instead of having to remember to retrieve group 9.

sir i don't know about you but that is line noise to me and my dyslexia agrees

https://docs.python.org/3/howto/regex.html#use-string-methods

Strings have several methods for performing operations with fixed strings and they’re usually much faster, because the implementation is a single small C loop that’s been optimized for the purpose, instead of the large, more generalized regular expression engine.

emacs doesn't try very hard but it does try to identify literal patterns and delegates to faster implementations. the mental framework this teaches students is to use things because they're faster, not because they're more explicit and easier to maintain. maybe the author believes that's the only thing people will listen to but i think it's the wrong approach for an introduction to regular expressions

i also think saying "the whole pattern will fail" is misleading since i believe it's not "the whole pattern" that fails but rather just the attempted left-to-right matching process that would otherwise have continued rightwards but may yet continue as a result of alternations, optionals, or some other such construction. but i'm not sure since i've never used a negative lookahead before and generally consider such constructions to be a last resort and much less readable than conditional logic applied surrounding the pattern string

but like come on

We’ll start by learning about the simplest possible regular expressions. Since regular expressions are used to operate on strings, we’ll begin with the most common task: matching characters.

For a detailed explanation of the computer science underlying regular expressions (deterministic and non-deterministic finite automata), you can refer to almost any textbook on writing compilers.

do you not at least have a recommendation for which compiler textbook to check out? is it because you don't believe in theory or because you think anyone who would be reading this document isn't smart enough to understand it? this is why we still submit our pattern programs to the regex mainframe and wait for it to produce our results on punch cards

turns out it's easier to simply bypass the script that fails to find dependencies you have on your system but the build process for the jit has a very hard dependency on clang ELF output section names and especially llvm-readobj's json output mode which neither readelf nor objdump appear to have an analogy for. really gotta hand it to clang they have this EEE stuff down to a science

particularly annoyed at the jit readme document mentioning how homebrew doesn't automatically pollute your PATH as if that were laziness and then says don't worry we'll find it for you and then hardcodes llvm@19. it was already weird that it's the one configure option that does not mention a clang dependency and then it proceeds to litter the broken build script with snark about homebrew and llvm support for mach-o. house of mirrors documentation experience

i made rice but i'm almost definitely going to pass out soon. i have decided to use the regex high level composition API to see how nice it is to contribute to cpython (they may decide it's inappropriate for the stdlib which would be fine too) and if so then i may decide to look again at the image dumper because python with instant startup without any portability concerns would be a very interesting language

oh here's a blast from the past

PythonWorks will initially be available on Windows 95, 98, and NT. Versions for Solaris 2.6 and later, Digital Unix 4, and Linux will be released in early 2000. [Availability on other platforms depends on demand.]

solaris mentioned before linux and "other platforms" beyond linux are considered important enough to at least gesture towards

they're SO sassy about msvc lmao

#pragma optimize("agtw", on) /* doesn't seem to make much difference... /
#pragma warning(disable: 4710) /
who cares if functions are not inlined 😉 /
/
fastest possible local call under MSVC */
#define LOCAL(type) static __inline type __fastcall

only change since the year 2000 was to avoid lumping in clang-cl with msvc earlier this year (from someone else who also contributed to the jit). didn't ms fire their whole python team recently? that's sad i forgot about that