For auditory output we could play low-quality audio samples to spell the text out, though in that case wherever possible our bootloader should use SFX or prerecorded audio.
Since I'm dreaming lets imagine we voicecast Michaela Swee for this!
As for visual output... We'd need a font mapping char(s) to relatively-positioned images. Making this a monochrome raster font with a single size & style would simplify things.
I see no reason not to include ligatures & proportional text.
2/3!