For people who may dig this post.
You could use FFT to get the frequencies from the input signal and quantize them to notes. Then you just learn...