Post · bonfire.cafe

@aeva@mastodon.gamedev.place · 8 hours ago

i wonder if there's a (non-destructive) way to get midi out of this thing 🤔

@aeva@mastodon.gamedev.place · 7 hours ago

I wonder how (im)practical it would be to add a FFT tile to mollytime that takes a stream of sound samples and generates midi events :3

Leonard Ritter

@lritter@mastodon.gamedev.place · 3 hours ago

@aeva see this is a great application for fuzzy logic. with machine learning, you can make a thing that can turn polyphonic strums to midi. skip the fft, freq discretization sucks.

aeva

@aeva@mastodon.gamedev.place · 3 hours ago

@lritter excellent. how

Leonard Ritter

@lritter@mastodon.gamedev.place · 3 hours ago

@aeva train on audio from midi - generating samples is easy. then invert the function.

~5 more replies

halcy @ :revision_ol:

@halcy@icosahedron.website · 3 hours ago

@lritter @aeva Think you're somewhat underselling compute and latency constraints here. If it's for an instrument, it's gonna have to be Real Fast with probably custom inference

aeva

@aeva@mastodon.gamedev.place · 3 hours ago

@halcy @lritter i'd like it to work on anything

halcy @ :revision_ol:

@halcy@icosahedron.website · 1 hour ago

@aeva @lritter if you *did* want to have some kind of low latency ML extractor, you'd probably want something like a bunch of causal convolution -> some kind of recurrent layer -> more convolution -> note output as a layer that is 1 for note on, 0 for off. I would still do a pre-stft anyways because it's a nice representation, and the model would only learn almost the same thing anyways, so, well, :akko_shrug: I forget what people use for pitch extraction for speech nowadays, though I did play around with that a few months ago at home..

I know for reasons that this https://arxiv.org/pdf/2306.03177 architecture and size of model is able to run on an old mid spec android phone in ~4 +- 4 milliseconds for 10 milliseconds of input, though making that happen takes work. If you can live with some more jitter since you're not literally outputting audio, you could get away with using onnx probably.

Still more of a "fun one week research project that might work or not" type thing, in any case