@SRAZKVT it's both better and worse than that. it specifically uses the STRINGLIB_CHAR macro but redefines it in source files that deal with bytes to be a char
@SRAZKVT in ucs4lib.h (i think that's utf-32 right?) it has #define STRINGLIB_CHAR Py_UCS4
i had missed this file that actually implemented the regex engine https://github.com/python/cpython/blob/main/Modules/_sre/sre_lib.h that looks much more familiar. the case statement doing what it was made for
this part in particular is fucking............it's really cool actually
/* generate 8-bit version */
#define SRE_CHAR Py_UCS1
#define SIZEOF_SRE_CHAR 1
#define SRE(F) sre_ucs1_##F
#include "sre_lib.h"
/* generate 16-bit unicode version */
#define SRE_CHAR Py_UCS2
#define SIZEOF_SRE_CHAR 2
#define SRE(F) sre_ucs2_##F
#include "sre_lib.h"
/* generate 32-bit unicode version */
#define SRE_CHAR Py_UCS4
#define SIZEOF_SRE_CHAR 4
#define SRE(F) sre_ucs4_##F
#include "sre_lib.h"
what does this mean?
/* This file is included three times, with different character settings */
that's right. they just did compile time polymorphism in 12 lines of standard C
As it happens, we still use CVS in our operating system project (there are reasons for doing this, but migration to git would indeed make sense).
While working on our project, we occasionally have to do a full checkout of the whole codebase, which is several gigabytes. Over time, this operation has gotten very, very, very slow - I mean "2+ hours to perform a checkout" slow.
This was getting quite ridiculous. Even though it's CVS, it shouldn't crawl like this. A quick build of CVS with debug symbols and sampling the "cvs server" process with Linux perf showed something peculiar: The code was spending the majority of the time inside one function.
So what is this get_memnode() function? Turns out this is a support function from Gnulib that enables page-aligned memory allocations. (NOTE: I have no clue why CVS thinks doing page-aligned allocations is beneficial here - but here we are.)
The code in question has support for three different backend allocators:
1. mmap
2. posix_memalign
3. malloc
Sounds nice, except that both 1 and 3 use a linked list to track the allocations. The get_memnode() function is called when deallocating memory to find out the original pointer to pass to the backend deallocation function: The node search code appears as:
for (c = *p_next; c != NULL; p_next = &c->next, c = c->next)
if (c->aligned_ptr == aligned_ptr)
break;
The get_memnode() function is called from pagealign_free():
#if HAVE_MMAP
if (munmap (aligned_ptr, get_memnode (aligned_ptr)) < 0)
error (EXIT_FAILURE, errno, "Failed to unmap memory");
#elif HAVE_POSIX_MEMALIGN
free (aligned_ptr);
#else
free (get_memnode (aligned_ptr));
#endif
This is an O(n) operation. CVS must be allocating a huge number of small allocations, which will result in it spending most of the CPU time in get_memnode() trying to find the node to remove from the list.
Why should we care? This is "just CVS" after all. Well, Gnulib is used in a lot of projects, not just CVS. While pagealign_alloc() is likely not the most used functionality, it can still end up hurting performance in many places.
The obvious easy fix is to prefer the posix_memalign method over the other options (I quickly made this happen for my personal CVS build by adding tactical #undef HAVE_MMAP). Even better, the list code should be replaced with something more sensible. In fact, there is no need to store the original pointer in a list; a better solution is to allocate enough memory and store the pointer before the calculated aligned pointer. This way, the original pointer can be fetched from the negative offset of the pointer passed to pagealign_free(). This way, it will be O(1).
I tried to report this to the Gnulib project, but I have trouble reaching gnu.org services currently. I'll be sure to do that once things recover.
So how does CVS use pagealign_xalloc? Like this:
/* Allocate more buffer_data structures. /
/ Get a new buffer_data structure. */
static struct buffer_data *
get_buffer_data (void)
{
struct buffer_data *ret;
ret = xmalloc (sizeof (struct buffer_data));
ret->text = pagealign_xalloc (BUFFER_DATA_SIZE);
return ret;
}
Surely BUFFER_DATA_SIZE will be something sensible? Unfortunately it is not:
#define BUFFER_DATA_SIZE getpagesize ()
So it will by create total_data_size / pagesize number of list nodes in the linear list. Maybe it's not that bad if the nodes are released in an optimal order?
The pagealign code stores new nodes always to the head of its list:
new_node->next = memnode_table;
memnode_table = new_node;
The datanodes in CVS code are however inserted into a list tail:
newdata = get_buffer_data ();
if (newdata == NULL)
{
(*buf->memory_error) (buf);
return;
}
if (buf->data == NULL)
buf->data = newdata;
else
buf->last->next = newdata;
newdata->next = NULL;
buf->last = newdata;
This creates a pathological situation where the nodes in the aligned list are in worst possible order as buf_free_datas() walks the internal list in first to last node, calling the pagealign_free:
static inline void
buf_free_datas (struct buffer_data *first, struct buffer_data *last)
{
struct buffer_data *b, *n, *p;
b = first;
do
{
p = b;
n = b->next;
pagealign_free (b->text);
free (b);
b = n;
} while (p != last);
}
In short: This is very bad. It will be slow as heck as soon as large amounts of data is processed by this code.
So imagine you have 2GB buffer allocated by using this code on a system that has 4KB pagesize. This would result in 524288 nodes. Each node would be stored in two lists, in first one they're last-head and in the other they're last-tail.
When the buf_free_datas is called for this buffer, it will walk totalnodes - index pagealign nodes for each of the released nodes. First iteration is (524288 - 1) "unnecessary" node walks, second (524288 - 2) and so forth. In other terms "sum of all integers smaller than itself", so in total totalnodes * (totalnodes - 1) / 2 extra operations.
This gives 137438691328 iterations.
#define would extract the identifier & parse the following optional argument list & body removing (escaped) newlines to load into a "macro" table.
Non-preprocessor lines would be scanned for these macros' identifiers to perform a find & replace, recursing to handle substitute in parameters.
2/3?
#define would extract the identifier & parse the following optional argument list & body removing (escaped) newlines to load into a "macro" table.
Non-preprocessor lines would be scanned for these macros' identifiers to perform a find & replace, recursing to handle substitute in parameters.
2/3?
Have you seen the new code? It's on malloc. It's literally on volatile.
It's on #define with side effects. It's literally on long long.
You can probably find it on attribute((packed)). Dude it's on restrict.
It's a static inline original. It's on extern. You can #include .
You can go to extern and fopen("extern","r"). Compile onto extern right now.
Go to extern. Dive into extern. You can extern it. It's on extern.extern has it for you. extern has it for you.
Have you seen the new code? It's on malloc. It's literally on volatile.
It's on #define with side effects. It's literally on long long.
You can probably find it on attribute((packed)). Dude it's on restrict.
It's a static inline original. It's on extern. You can #include .
You can go to extern and fopen("extern","r"). Compile onto extern right now.
Go to extern. Dive into extern. You can extern it. It's on extern.extern has it for you. extern has it for you.
oh here's a blast from the past
PythonWorks will initially be available on Windows 95, 98, and NT. Versions for Solaris 2.6 and later, Digital Unix 4, and Linux will be released in early 2000. [Availability on other platforms depends on demand.]
solaris mentioned before linux and "other platforms" beyond linux are considered important enough to at least gesture towards
they're SO sassy about msvc lmao
#pragma optimize("agtw", on) /* doesn't seem to make much difference... /
#pragma warning(disable: 4710) / who cares if functions are not inlined 😉 /
/ fastest possible local call under MSVC */
#define LOCAL(type) static __inline type __fastcall
only change since the year 2000 was to avoid lumping in clang-cl with msvc earlier this year (from someone else who also contributed to the jit). didn't ms fire their whole python team recently? that's sad i forgot about that
int (*arrayp)[len] = malloc(sizeof(*arrayp));
/* init *arrayp */
qsort_typed(*arrayp, cmp, data);
or, what could be more useful, say you have a struct type:
struct int_array { size_t len; int *data };
#define to_vla(arr) ((int (*)[arr.len]) arr.data)
struct int_array my_ints = ...;
qsort_typed(*to_vla(my_ints), compare, data);
and it’s the reason i really wish we add a way to specify VLA syntax in structures, so we could end up with something similar to:
struct int_array { size_t len; int (*data)[len]; };
struct int_array my_ints = ...;
qsort_typed(*my_ints.data, compare, data);
int (*arrayp)[len] = malloc(sizeof(*arrayp));
/* init *arrayp */
qsort_typed(*arrayp, cmp, data);
or, what could be more useful, say you have a struct type:
struct int_array { size_t len; int *data };
#define to_vla(arr) ((int (*)[arr.len]) arr.data)
struct int_array my_ints = ...;
qsort_typed(*to_vla(my_ints), compare, data);
and it’s the reason i really wish we add a way to specify VLA syntax in structures, so we could end up with something similar to:
struct int_array { size_t len; int (*data)[len]; };
struct int_array my_ints = ...;
qsort_typed(*my_ints.data, compare, data);
that being said, it's curious that it wasn't the first commit a7640bfa10c55 that did that. the first commit didn't make so many pretenses about being some sort of generic protocol: texas instruments called it a gas gauge, and in Kconfig it was labelled by the manufacturer and model name!
but there was a followup d3ab61ecbab2b that changed pretty much everything without while taking great pains to obscure it. my favorite part:
-#define BATTERY_CHARGING 0x40
+#define BATTERY_DISCHARGING 0x40
that's a breaking ABI change!! no explanation!
had a good laugh at texas instrument proudly advertising "Supports SHA-1 Authentication" as late as 2013 though https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/196/slus757c.pdf
i did think adding measures like power (?) cycle count would seem to be a reasonable thing to introduce into the language we use to talk about batteries. but there's another bit from sbs-battery.c that's concise enough to reproduce in full:
static void sbs_unit_adjustment(struct i2c_client *client,
enum power_supply_property psp, union power_supply_propval val)
{
#define BASE_UNIT_CONVERSION 1000
#define BATTERY_MODE_CAP_MULT_WATT (10 * BASE_UNIT_CONVERSION)
#define TIME_UNIT_CONVERSION 60
#define TEMP_KELVIN_TO_CELSIUS 2731
switch (psp) {
case POWER_SUPPLY_PROP_ENERGY_NOW:
case POWER_SUPPLY_PROP_ENERGY_FULL:
case POWER_SUPPLY_PROP_ENERGY_FULL_DESIGN:
/ sbs provides energy in units of 10mWh.
* Convert to µWh
*/
val->intval *= BATTERY_MODE_CAP_MULT_WATT;
break;
> I have to pass 256Kb of text as an argument to the "aws sqs"
what, uhhh, what
> MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
> The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
> I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces
casually patching the kernel to send a quarter megabyte as a *single* argument oh my god i'm laughing hard
> I have to pass 256Kb of text as an argument to the "aws sqs"
what, uhhh, what
> MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
> The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
> I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces
casually patching the kernel to send a quarter megabyte as a *single* argument oh my god i'm laughing hard
ahhhhh yes. it uses the magic current variable
which comes from the scheduler! which makes sense since the scheduler controls everything that's happening on the system
#define get_current() (current_thread_info()->task)
#define current get_current()