Post · bonfire.cafe

By 100x speed difference, I mean the uu encoding/decoding rate is about 30 bytes per second. I'm not accustomed to a correct program being this catastrophically slow ;)

Not throwing shade at #LuaLang for the 100x speed difference: it's astonishing that a modern interpreter can be built for a 4.77 MHz 8088 and run at usable, if lukewarm, speeds. The 100x size difference comes down to the interpreter including Lua's full library, most of which isn't needed for all programs.

If I had to guess, I'd expect most of the time to be spent in string operations and syscalls. Lua translates file contents to (immutable) string when reading, so more conversions are necessary to perform transformations and output results. Moreover, when writing output the program does f:write() 3-4 bytes at a time: if this were unbuffered and translating directly to hundreds of write syscalls, that would also be very slow.

#retrocomputing

While I was working on this, the article Python Numbers Every Programmer Should Know appeared on the orange website. In #LuaLang, and on a 16-bit target, these overheads are less -- for example, a number weighs 10 bytes instead of 24 bytes -- but overheads don't have much place to hide on a small, slow machine.

(Btw numbers cost 7 bytes each in 8-bit Microsoft BASIC so Lua isn't gratuitously inefficient here, even by the standards of 50 years ago.)

One place that makes overhead really obvious: a 64K segment holds a table of length, at most, 4,096 entries. That's 40,960 bytes, and Lua's strategy is to double allocation size every time it wants to grow the table. 2 x 40,960 exceeds a 64K segment, so 4,096 entries is the growth limit.

On a 640K machine, after deducting the ~250K (!) size of the interpreter (which is also fully loaded into RAM), you'll get maybe five full segments free if you're lucky. So that's like maybe 20,000 datums total, split across five tables.

Meanwhile a tiny-model #Forth / assembly / C program could handle 20,000 datums in a single segment without breaking too much of a sweat!

The efficiency has costs to programmer time, of course. Worrying about data types, limits, overflows, etc. The kinds of things I was hoping to avoid by using Lua on this hardware -- and to its credit, it does a good job insulating me from them. Its cost is that programs must be rewritten for speed in some other language once out of the rapid prototyping phase and having reasonable speed / data capacity becomes important.

I'd estimate the threshold where traditional interpreters like Lua become okay for finished/polished software of any significant scope, is somewhere around 2MB RAM / 16MHz. So think, like, a base model 386. Maybe this is why the bulk of interpreters available in DOS are via DJGPP which requires a 386 or better anyway.

#BASIC was of course used on much smaller hardware, but was famously unsuited to speed or to large programs / data.

I know success stories for #Lisp in kilobytes of memory, but I'm not quite sure how they do it / to what extent the size of the interpreter, and overhead of data representation (tags + cons representation), eats into available memory and limits the scope of the program, as seen with other traditional interpreters.

This is beginning to explain why #Forth has such a niche on small systems. It has damn near zero size overhead on data structures. (The only overhead is for the interpreter core (a few K) and storing string names in the dictionary (which can be eliminated via various tricks)). ~1x size and ~10x speed overhead is the bargain of the century to unlock #repl based development. However, you're still stuck with the agonizing pain of manual memory management and numeric range problems / overflows. Which is probably why the world didn't stop with Forth, but continued on to bigger interpreters.

#retrocomputing