Discussion

@hipsterelectron@circumstances.run · 4 months ago

having a lot of strong responses in my head to reading the gentle introduction to regular expressions in the python docs https://docs.python.org/3/howto/regex.html#regex-howto

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

not like getting upset

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

it's just like. ok. i remember SNOBOL. i remember fortran referencing line numbers. providing string inputs to the regex compiler feels like that

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

tiny, highly specialized programming language

HUGE props for calling it a programming language. great start

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

as well as "embedded inside python" which is another important point

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

You can also use REs to modify a string or to split it apart in various ways.

regex actually cannot do this. like the thing being called an RE or "regular expression" here cannot express modifications to a string nor even separation. this is not pedantry

Jamey Sharp

@jamey@toot.cat replied · 4 months ago

@hipsterelectron I think you should read "use REs to" here as meaning "use REs in the process of" or something, which is demonstrably true

1+ more replies (not shown)

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i do like the immediate discussion of the matching engine introducing implicit semantics. great great way to introduce students to complex topics while maintaining a consistent focus at first

~83 more replies (not shown)

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions.

in fact regex itself can only perform matching. matching html with regex is not very far from performing the complex stateful substitutions that are commonplace with classical regex APIs

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

https://docs.python.org/3/howto/regex.html#more-metacharacters

referring to the alternation operator | as a zero-width assertion is very novel to me. it's especially mentioned in the context of its excessively low precedence

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

so much of this document is specifically describing the particular regex language accepted by the python stdlib re module and not the concepts of a pattern language which i think is a travesty

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

like non-capturing and named groups are introduced in terms of perl and which metacharacters they had available https://docs.python.org/3/howto/regex.html#non-capturing-and-named-groups i do not think inside jokes about metacharacters are helpful for people new to pattern matching

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

Python supports several of Perl’s extensions and adds an extension syntax to Perl’s extension syntax. If the first character after the question mark is a P, you know that it’s an extension that’s specific to Python.

this is useful info that i was not aware of and is also illustrative of how innovation works in the regex ecosystem

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

one thing i definitely like about python matches is that it always provides a value for every group to each match even if the value is None. this is one benefit of the explicit match objects i researched introducing to elisp last year

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

loled at this example though

InternalDate = re.compile(r'INTERNALDATE "'
 r'(?P[ 123][0-9])-(?P[A-Z][a-z][a-z])-'
 r'(?P[0-9][0-9][0-9][0-9])'
 r' (?P[0-9][0-9]):(?P[0-9][0-9]):(?P[0-9][0-9])'
 r' (?P[-+])(?P[0-9][0-9])(?P[0-9][0-9])'
 r'"')

It’s obviously much easier to retrieve m.group('zonem'), instead of having to remember to retrieve group 9.

sir i don't know about you but that is line noise to me and my dyslexia agrees

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

https://docs.python.org/3/howto/regex.html#use-string-methods

Strings have several methods for performing operations with fixed strings and they’re usually much faster, because the implementation is a single small C loop that’s been optimized for the purpose, instead of the large, more generalized regular expression engine.

emacs doesn't try very hard but it does try to identify literal patterns and delegates to faster implementations. the mental framework this teaches students is to use things because they're faster, not because they're more explicit and easier to maintain. maybe the author believes that's the only thing people will listen to but i think it's the wrong approach for an introduction to regular expressions

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

the response from emacs-devel was resoundingly that a lisp regex implementation would be leagues more useful than another native code impl

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

booooo it use html as an example but it was just an excuse to tell people to use an xml parser instead. at least link to an html parser project so they can see the horrors of html parsing firsthand instead of being told it's too dangerous

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

ok it ends telling the reader to check out a book from the library which is vaguely subversive and i appreciate

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

the sentence "A negative lookahead cuts through all this confusion" is difficult to take seriously though

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i also think saying "the whole pattern will fail" is misleading since i believe it's not "the whole pattern" that fails but rather just the attempted left-to-right matching process that would otherwise have continued rightwards but may yet continue as a result of alternations, optionals, or some other such construction. but i'm not sure since i've never used a negative lookahead before and generally consider such constructions to be a last resort and much less readable than conditional logic applied surrounding the pattern string

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

in general i think this document focuses on metacharacters and the specific of python regex syntax to an incredible degree for something that begins with a definition of a regular expression. for example, h2 "More Pattern Power" immediately transitions to h3 "More Metacharacters"

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i'm going to stop now because i can tell i'm just going to get even more critical and i am now confident that making the python regex AST/IR is a good idea

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

but like come on

We’ll start by learning about the simplest possible regular expressions. Since regular expressions are used to operate on strings, we’ll begin with the most common task: matching characters.
For a detailed explanation of the computer science underlying regular expressions (deterministic and non-deterministic finite automata), you can refer to almost any textbook on writing compilers.

do you not at least have a recommendation for which compiler textbook to check out? is it because you don't believe in theory or because you think anyone who would be reading this document isn't smart enough to understand it? this is why we still submit our pattern programs to the regex mainframe and wait for it to produce our results on punch cards

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i'll have to write a better version

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i like the stdlib docs https://docs.python.org/3/library/re.html#text-munging

they give a definition for text "munging". that's cute i like that shit

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i'm gonna go read the re implementation now. now that python has a jit maybe it could do the same thing we wanna do for emacs

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

python actually uses autoconf no wonder it's so portable. perl's configuration script which (1) runs lengthy tests by default (2) prints out cutesy messages to remind you about its artistic license (3) takes almost as long as gettext to run is so much more annoying

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

loled at --without-doc-strings "to reduce the memory footprint". at least in the 2002 commit that added that option they clarified it referred to executable size

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

seeing that the developer who added that has a german name i might be a little more empathetic to that though. does python have docstring translations? i was about to look into compressing them but now it just seems bizarre not to

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

python docstrings aren't terribly helpful even in english though. this is why i used R every chance i could get in college

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

the jit can't be enabled while disabling the gil omg drama in the cpython optimization fandom

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

oh BOOOOO jit support is clang-dependent why on earth would you not mention that when other options like the tail call interpreter go out of their way to imply only clang is supported

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

hated the smarmy and grotesquely misleading language in the why-llvm footnote in Tools/jit/README.md so i looked in the git blame and i was wrong this time it wasn't a google employee it was microsoft

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

The JIT compiler does not require end users to install any third-party dependencies, but part of it must be built using LLVM[why-llvm]. You are not required to build the rest of CPython using LLVM, or even the same version of LLVM (in fact, this is uncommon).

this reads like a corporate strategy document

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i have a recent clang on my machine btw but the build script refuses to find it even though the readme specifically mentions that the build script will find deps

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

the "unversioned executable" code path is mysteriously broken. i think adding untested code without a clear indication of it being untested is kind of a bad thing to do

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i'm going to fix it obviously

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i think this script is potentially worse than useless

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i wouldn't be upset if there's hadn't been the footnote about "why llvm" with zero citation and the helpful build script that breaks if you're using a more recent clang than the one in cpython's github actions

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i'm actually gonna try using gcc first

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

ok this microsoft dev leaves very snarky and obnoxious code comments i don't even do that in private branches

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

# --elf-output-style=JSON is only *slightly* broken on Mach-O...

no description of what the code is actually doing or what "broken" means. this is the only contributor to the build scripts. not someone i want to work with

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

turns out it's easier to simply bypass the script that fails to find dependencies you have on your system but the build process for the jit has a very hard dependency on clang ELF output section names and especially llvm-readobj's json output mode which neither readelf nor objdump appear to have an analogy for. really gotta hand it to clang they have this EEE stuff down to a science

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

particularly annoyed at the jit readme document mentioning how homebrew doesn't automatically pollute your PATH as if that were laziness and then says don't worry we'll find it for you and then hardcodes llvm@19. it was already weird that it's the one configure option that does not mention a clang dependency and then it proceeds to litter the broken build script with snark about homebrew and llvm support for mach-o. house of mirrors documentation experience

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

really really bad vibes. i do not have the energy to spend more time on this right now until they demonstrate performance that matters. being a jit i assume they're optimizing for persistent servers

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

this does make me wanna get back to my image dumping branch from a few years ago though

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i assume the c object model has improved in the years since

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

oh also i can disable the gil now since i don't want this jit anyway

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

the snarky comments are so ridiculously unprofessional especially when the version checking script is literally worse than useless. that's pretty brazen

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

the reason for this high-level regex composition API is because people misuse this mystique around compiler engineering for personal gain or just sheer vindictiveness all the time. you get to guess which one it is at any given time

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i build helpful machines not harmful machinations

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

pyenv has no documentation for what to do if you have just built python from source by hand which is ludicrously bad vibes so i just removed it too. spack has great docs and supports more versions and also supports building stuff yourself outside of spack

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i'm pretty sure the above was just because cpython does not create a symlink named "python" by default in the install prefix which is rather kind of it but spack provides a strict superset of pyenv's functionality anyway and has much better docs so nothing was lost

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i made rice but i'm almost definitely going to pass out soon. i have decided to use the regex high level composition API to see how nice it is to contribute to cpython (they may decide it's inappropriate for the stdlib which would be fine too) and if so then i may decide to look again at the image dumper because python with instant startup without any portability concerns would be a very interesting language

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

i still don't really like writing python it's mostly fine but that makes it so much easier for me to focus and i care much more about ecosystem stuff like the world-class packaging protocols than the surface level syntax

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

Secret Labs AB has developed a commercial Integrated Development Environment for Python; PythonWorks, which will be demonstrated at the eighth International Python Conference in Washington DC, USA.

https://legacy.python.org/workshops/2000-01/proceedings/posters/karlsson/karlsson.htm

very cute

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

oh here's a blast from the past

PythonWorks will initially be available on Windows 95, 98, and NT. Versions for Solaris 2.6 and later, Digital Unix 4, and Linux will be released in early 2000. [Availability on other platforms depends on demand.]

solaris mentioned before linux and "other platforms" beyond linux are considered important enough to at least gesture towards

d@nny disc@ mc²

@hipsterelectron@circumstances.run replied · 4 months ago

they're SO sassy about msvc lmao

#pragma optimize("agtw", on) /* doesn't seem to make much difference... /
#pragma warning(disable: 4710) / who cares if functions are not inlined 😉 /
/ fastest possible local call under MSVC */
#define LOCAL(type) static __inline type __fastcall

only change since the year 2000 was to avoid lumping in clang-cl with msvc earlier this year (from someone else who also contributed to the jit). didn't ms fire their whole python team recently? that's sad i forgot about that

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances

Bonfire social · 1.0.1-alpha.44 no JS en

Automatic federation enabled