#Tag · bonfire.cafe

Autovectorization seems like a cool way to write cross platform SIMD code. But does anyone know of solutions to the insight issue? If I were to write a function which relies on autovectorization, wouldn't I literally have to 1) compile with every compiler + compiler settings + CPU arch + platform I wanna support, 2) disassemble all resulting binaries, 3) read analyze the assembly code to verify that it's vectorized how I expect, and 4) repeat for every change?

#programming #simd #compilers #plt

mort

@mort@floss.social · 3 weeks ago

#programming #simd #compilers #plt

Jan :rust: :ferris: boosted

Jérôme Humbert

@djee@mastodon.gamedev.place · last month

Spent 1h today trying to implement an equivalent of vpermilps (_mm_permutevar_ps) in SSE, only to find that my "solution" used a per-lane shift (vpsrlvd)… which is only available in AVX2 🙄 SIMD on Intel is really the Swiss cheese of APIs; so difficult to do anything without an extensive knowledge of all the quirks and holes in the API. In the end the correct solution was to use pshufb, which is probably obvious if you’re familiar enough with SIMD but requires jumping through hoops. #simd #sse

Jérôme Humbert

@djee@mastodon.gamedev.place · last month

Hacker News

@h4ckernews@mastodon.social · last month

High-performance C++ hash table using grouped SIMD metadata scanning

https://github.com/Cranot/grouped-simd-hashtable

#HackerNews #HighPerformance #C++ #HashTable #SIMD #MetadataScanning #TechnologyOptimization #GitHub

GitHub

GitHub - Cranot/grouped-simd-hashtable: High-performance C++ hash table using grouped SIMD metadata scanning. Beats SOTA at scale.

High-performance C++ hash table using grouped SIMD metadata scanning. Beats SOTA at scale. - Cranot/grouped-simd-hashtable

Hacker News

@h4ckernews@mastodon.social · 2 months ago

SIMD City: Auto-Vectorisation

https://xania.org/202512/20-simd-city

#HackerNews #SIMD #City #Auto-Vectorisation #SIMD #City #Vectorisation #Tech #News #Programming #Insights

SIMD City: Auto-vectorisation — Matt Godbolt’s blog

Doing more with less: vectorising can speed your code up 8x or more!

Boozook 🦀 :playdate:

@boozook@mastodon.gamedev.place · 2 months ago

Hey all! 👋🏻
I’m looking for some shader-like pipeline/#rendering system/library/framework for 1-bit graphics with 2x #framebuffer (double-buffered — actual & previous) with #blitting on #SIMD and #SWAR? CPU-only, mostly targeting ARM32/64/Thumb1.
I understand that it’s rare and mostly impossible to exist, so I just need some source-based guidance/hints of oldschool/demoscene- tricks and algorithms which I don’t know yet (I know a lot already, I’m 40)) and of course i can port.

Hacker News

@h4ckernews@mastodon.social · 3 months ago

The state of SIMD in Rust in 2025

https://shnatsel.medium.com/the-state-of-simd-in-rust-in-2025-32c263e5f53d

#HackerNews #SIMD #Rust #2025 #RustProgramming #Technology #Trends #FutureDevelopment

Medium

The state of SIMD in Rust in 2025

If you’re already familiar with SIMD, the table is all you need. And if you’re not, you will understand the table by the end of the…

Jan :rust: :ferris: and 1 other boosted

Jan :rust: :ferris:

@janriemer@floss.social · 3 months ago

A story about never ever giving up...❤️‍🔥

After several weeks, questioning my life choices, I've finally figured out why my #Whisper #SpeechToText system had been so slow on #Windows:

It was because apparently the #Rust-FFI wrapped #CPlusPlus code (Whisper.cpp) didn't compile with AVX and AVX2 enabled ( #SIMD!). I've tried it on two Windows machines (both AVX-capable). On one of the machines, with #Linux, it has successfully detected AVX/AVX2, though and has run fast.

1/?

Jan :rust: :ferris:

@janriemer@floss.social · 5 months ago

Hmm... 🤔

My suspicion why it's "not working" is:

Even though I do `cargo run --release` I've seen, during my investigation of the above compiling-fail-nightmare, that it puts artifacts into `Debug` folder.

So it might be that the program (Whisper.cpp to be precise) runs as a debug build and is just _terribly_ slow. 🐌

Oh boy, the struggle continues... 🤸

This might be related:
https://codeberg.org/tazz4843/whisper-rs/issues/226

Codeberg.org

x10 slower in release build

Running Whisper for transcribing in a release build seems to take approximately x10 as long as a debug build. I tested this with the `audio_transcription.rs` example using the `jfk.wav` file and the `ggml-medium.en-q5_0.bin` model. In Debug ``` PS C:\Repositories\wh...

Jan :rust: :ferris:

@janriemer@floss.social replied · 3 months ago

A story about never ever giving up...❤️‍🔥

After several weeks, questioning my life choices, I've finally figured out why my #Whisper #SpeechToText system had been so slow on #Windows:

1/?

Larry (Mr.Optimization)

@fast_code_r_us@floss.social · 8 months ago

I decided to share my Arm NEON optimizations for the FFmpeg Cinepak encoder. On Apple Silicon / RPI / NEON 32/64-bit, it gets a 250-300% speedup for encoding:

https://github.com/bitbank2/FFmpeg-in-Xcode

#FOSS
#Optimization
#SIMD
#NEON