Speech recognition for audio files

Question

0.00/5 (No votes)

See more:

Hi all,
I am working on a project that I need to transcribe phone recording files to text; and there are about 200 audio files/day. I tried Microsoft Speech SDK for desktop (system.speech.recognition) by using Dictationgrammar:

C++

speechRecognizer.LoadGrammar(new DictationGrammar());

but the result is too bad in term of accuracy.
I thought that it is better to use the Server Method (Microsoft.speech.Recognition) and I have to build a grammar for this. But how to build a free speech grammar which is over 10,000 words?
Please help!
thanks

What I have tried:

Microsoft Speech recognition SDK for Desktop.

Posted 5-Apr-16 7:41am

DanLegacy

Updated 5-Apr-16 8:54am

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Answer 1 · 2016-04-05T08:54:00

Solution 1

You cannot fix the engine. Speech recognition does work, but the quality is Microsoft's spectacular failure.
It can work with minimal size grammar, with no phrases which would sound even remotely similar, for the user with fairly good pronunciation. More phrases, and the engine will mix up most of the phrases.

Dictation? Just forget it.

One reason I called this shame "spectacular failure" is that on Android I can do a lot of dictation in different language (there are dictation keyboards, freely available, free of charge) where the quality of recognition makes it a practical competitor to smallish Android keyboards. If not comparison with this software, I would probably call the engine supplied by Microsoft "an achievement", because one really could use it for some minimalistic voice interfaces.

—SA

Posted 5-Apr-16 8:54am

Sergey Alexandrovich Kryukov

v2

Comments

Matt T Heffron 5-Apr-16 15:21pm

The recent announcements at the Microsoft Build 2016 include (apparently) much higher quality Speech Recognition capability using APIs against services on Azure. (I can guess where their speech recognition development effort went.)
Microsoft Cognitive Services: Speech API
https://www.microsoft.com/cognitive-services/en-us/speech-api

Sergey Alexandrovich Kryukov 5-Apr-16 16:07pm

Thank you very much, Matt. Let's see... But so far, this is what it is. :-(
What is suggested on this site is something like software as service, not permanent software installation. Am I right? So I cannot see when and how we can simply have the engine on our OS...
—SA

Matt T Heffron 5-Apr-16 16:16pm

Yes, this would be Software as a Service (SaaS).
So is Siri dictation.
Are you sure that the Android dictation input isn't SaaS?
On my Android tablet, I can't even use voice input for asking a question without a network connection back to Google. Generalized dictation would ~~probably~~ be even harder.
The different speaker models (dialects, accents, ...), vocabulary, environmental acoustics adaptation (noisy, quiet, echoey...) make this general case too hard for (most) stand-alone systems.
I can easily believe that the Speech-SDK has been "frozen" at the current capabilities.
For the original question, SaaS with a high quality system might be the perfect solution! (Currently 5000 transactions per month are FREE! $4/1000 transactions (<= 15 seconds) above that.)
If it were my project I'd certainly look into it!

Sergey Alexandrovich Kryukov 5-Apr-16 16:35pm

What do you mean sure? How it can be SaaS? It is installed and just works. I'm not always connected to Internet.
This is a keyboard.
—SA

Matt T Heffron 5-Apr-16 16:58pm

OK, I have no experience with dictation keyboard apps.
(Even so, I'd guess/expect that it is pretty limited compared to something that could transcribe recorded phone calls.)

Sergey Alexandrovich Kryukov 5-Apr-16 22:37pm

I never saw anything which does automatic translation. Unless you call something at the level of Google Translate a "translation"... :-)
—SA

Matt T Heffron 6-Apr-16 1:44am

I didn't say anything about "translation"...
I did refer to "transactions" and "transcribe" ;-)

BTW. What Android "dictation keyboard" are you using/recommend?

Sergey Alexandrovich Kryukov 6-Apr-16 3:19am

I need time to figure it out. It's my daughter's; I rarely take it; and I chose what to install quite a while ago. Will soon test my software and will take a look, but maybe not so soon. This thing works quite well.
—SA

Tripurari Poojan 6-Apr-16 7:55am

if any solutions or any demo please provide me.

DanLegacy 5-Apr-16 20:02pm

Thank you very much for your help Sergey and Matt,
I tried many times with Microsoft Speech Recognition but I couldn't get better result. I think I have to switch to Project Oxford as you state above, but How can I use that since I am a newbie in C#? I couldn't download anything either.

Sergey Alexandrovich Kryukov 5-Apr-16 22:39pm

You are welcome. I wish I could help more, but this is what it is.
Unfortunately. I'm unaware of Project Oxford...
—SA

Matt T Heffron 6-Apr-16 1:41am

Project Oxford is the "old name" for Microsoft Cognitive Services.
The "Get started with Speech Recognition and/or intent in C# for .Net Windows" info is at:
https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/getstarted/get-started-csharp-desktop
This requires registration. Follow the instructions on this link.
(I have not tried this! But some day...)

It's probably not going to be "straightforward" for "a newbie in C#". Sorry.

Sergey Alexandrovich Kryukov 6-Apr-16 3:20am

Certainly not for a newbie...
—SA

DanLegacy 6-Apr-16 12:35pm

I really appreciate your help. I'll try it and let you know.
Dan