Enabling streaming, you can see that the Foundation Models API is slow — I would say alarmingly slow — on M1 hardware. I'm not even sure what kind of optimization I might be able to do to speed this up, as it's seemingly purely on the token generation side 🤔
It's going to take more than a little care and attention before you start sprinkling this all over your apps, that's for sure. Good to know