This is quite a difficult project, because usually the spectrum of the voice of real-life instrument is full of noise and main tone if floating. So, it will not be just "frequency analysis", it will be real image recognition task, pretty hard to solve. I'm familiar with existing musical applications doing that: the quality of recognition is quite poor in all products I know (my very modest musical hearing is orders of magnitude better :-)). One of the biggest problems is the fast change in spectrum in comparably short periods of time.
There are a number of good works on separate components like FFT (which in not the hardest part), such as this:
http://www.extremeoptimization.com/solutions/FastFourierTransformsFft.aspx[
^]. I also know a short CodeProject article:
How to implement the FFT algorithm[
^].
So idea is generally good, but — no offence — I'm quite a bit skeptical about your prospect to make a big success.
—SA