i wonder if there's a (non-destructive) way to get midi out of this thing 🤔
Post
I wonder how (im)practical it would be to add a FFT tile to mollytime that takes a stream of sound samples and generates midi events :3
@aeva see this is a great application for fuzzy logic. with machine learning, you can make a thing that can turn polyphonic strums to midi. skip the fft, freq discretization sucks.
@lritter excellent. how
@aeva train on audio from midi - generating samples is easy. then invert the function.
@aeva @lritter if you *did* want to have some kind of low latency ML extractor, you'd probably want something like a bunch of causal convolution -> some kind of recurrent layer -> more convolution -> note output as a layer that is 1 for note on, 0 for off. I would still do a pre-stft anyways because it's a nice representation, and the model would only learn almost the same thing anyways, so, well, :akko_shrug: I forget what people use for pitch extraction for speech nowadays, though I did play around with that a few months ago at home..
I know for reasons that this https://arxiv.org/pdf/2306.03177 architecture and size of model is able to run on an old mid spec android phone in ~4 +- 4 milliseconds for 10 milliseconds of input, though making that happen takes work. If you can live with some more jitter since you're not literally outputting audio, you could get away with using onnx probably.
Still more of a "fun one week research project that might work or not" type thing, in any case