The main uses for audio input (at least in our hypothetical string-centric OS) would be saving snippets of it, with or without video, to be played back later or streaming it over the internet via (S)RTP to be played elsewhere.
Then again we might want to incorporate voice commands (traversing links by label), for which we'd need the aid of an NPU. I find Mythic's designs sufficiently weird for my tastes...
Setting that usecase aside it'd mostly be our camera app & SRTP viewer interested.
1/?