Please see my comments to the question. This is my answer.
I also remembered that I tried some open-source products few years ago. Even though the acoustic input was very pure and based on Western equal temperament system (electronic keyboard), none of them demonstrated satisfactory results. It could be considered just experimental effort. I can imagine that by now some products are of near commercial quality (I can imagine the reaction of a good musician; those people can usually catch pretty complex pieces of music on the fly, surpassing any thinkable tools the was we can surpass any language tools in image recognition :-)). I can believe that some considerable progress is possible when working with the team of best developers well familiar with acoustics, mathematics and music, during a number of years. This is really very, very difficult. If you were on such level, you would not probably ask such question.
[EDIT]
Answering after the clarification from the OP.
OK, this is called
speech recognition:
http://en.wikipedia.org/wiki/Speech_recognition[
^].
With .NET, the available technology is easily accessible. You need to use the assembly "speech.dll" (from the GAC, as it is bundled with .NET Framework, so, in Visual Studio, use the tab ".NET" of the "Add Reference" window).
Please see:
http://msdn.microsoft.com/en-us/library/system.speech.recognition.aspx[
^].
However, don't be too excited. This technology works reasonably well if you simply develop some speech commander for your application, with reasonable number of distinct commands. If you try to perform the free dictation, you can use available dictation grammar:
http://msdn.microsoft.com/en-us/library/system.speech.recognition.dictationgrammar.aspx[
^].
You can do it, but the results… I would call them frustrating. Anyway, reportedly, even the dictation technology of reasonable quality is commercially available. Maybe it can become a commonplace soon enough…
—SA