Click here to Skip to main content
15,885,804 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi all,
I am working on a project that I need to transcribe phone recording files to text; and there are about 200 audio files/day. I tried Microsoft Speech SDK for desktop (system.speech.recognition) by using Dictationgrammar:
C++
speechRecognizer.LoadGrammar(new DictationGrammar());

but the result is too bad in term of accuracy.
I thought that it is better to use the Server Method (Microsoft.speech.Recognition) and I have to build a grammar for this. But how to build a free speech grammar which is over 10,000 words?
Please help!
thanks

What I have tried:

Microsoft Speech recognition SDK for Desktop.
Posted
Updated 5-Apr-16 8:54am

1 solution

You cannot fix the engine. Speech recognition does work, but the quality is Microsoft's spectacular failure.
It can work with minimal size grammar, with no phrases which would sound even remotely similar, for the user with fairly good pronunciation. More phrases, and the engine will mix up most of the phrases.

Dictation? Just forget it.

One reason I called this shame "spectacular failure" is that on Android I can do a lot of dictation in different language (there are dictation keyboards, freely available, free of charge) where the quality of recognition makes it a practical competitor to smallish Android keyboards. If not comparison with this software, I would probably call the engine supplied by Microsoft "an achievement", because one really could use it for some minimalistic voice interfaces.

—SA
 
Share this answer
 
v2
Comments
Matt T Heffron 5-Apr-16 15:21pm    
The recent announcements at the Microsoft Build 2016 include (apparently) much higher quality Speech Recognition capability using APIs against services on Azure. (I can guess where their speech recognition development effort went.)
Microsoft Cognitive Services: Speech API
https://www.microsoft.com/cognitive-services/en-us/speech-api
Sergey Alexandrovich Kryukov 5-Apr-16 16:07pm    
Thank you very much, Matt. Let's see... But so far, this is what it is. :-(
What is suggested on this site is something like software as service, not permanent software installation. Am I right? So I cannot see when and how we can simply have the engine on our OS...
—SA
Matt T Heffron 5-Apr-16 16:16pm    
Yes, this would be Software as a Service (SaaS).
So is Siri dictation.
Are you sure that the Android dictation input isn't SaaS?
On my Android tablet, I can't even use voice input for asking a question without a network connection back to Google. Generalized dictation would probably be even harder.
The different speaker models (dialects, accents, ...), vocabulary, environmental acoustics adaptation (noisy, quiet, echoey...) make this general case too hard for (most) stand-alone systems.
I can easily believe that the Speech-SDK has been "frozen" at the current capabilities.
For the original question, SaaS with a high quality system might be the perfect solution! (Currently 5000 transactions per month are FREE! $4/1000 transactions (<= 15 seconds) above that.)
If it were my project I'd certainly look into it!
Sergey Alexandrovich Kryukov 5-Apr-16 16:35pm    
What do you mean sure? How it can be SaaS? It is installed and just works. I'm not always connected to Internet.
This is a keyboard.
—SA
Matt T Heffron 5-Apr-16 16:58pm    
OK, I have no experience with dictation keyboard apps.
(Even so, I'd guess/expect that it is pretty limited compared to something that could transcribe recorded phone calls.)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900