Click here to Skip to main content
15,879,096 members
Articles / Programming Languages / C#
Article

C# Speech to Text

Rate me:
Please Sign up or sign in to vote.
4.99/5 (54 votes)
7 May 2012CPOL3 min read 437.6K   89.1K   117   75
This article describes how to handle and use the SpeechRecognitionEngine class that is shipped with and since .NET 3.0.

speech to text application

Introduction

The purpose of this article is it to give you a small insight of the capabilities of the System.Speech assembly. In detail, the usage of the SpeechRecognitionEngine class. The MSDN documentation of the class can be found here.

Background

I read several articles about how to use Text to Speech, but as I wanted to find out how to do it the opposite way, I realized that there is a lack of easily understandable articles covering this theme, so I decided to write a very basic one on my own and share my experiences with you.

The Solution

So now let's start. First of all you need to reference the System.Speech assembly in your application located in the GAC.

gac

This is the only reference needed containing the following namespaces and its classes. The System.Speech.Recognition namespace contains the Windows Desktop Speech technology types for implementing speech recognition.

  • System.Speech.AudioFormat
  • System.Speech.Recognition
  • System.Speech.Recognition.SrgsGrammar
  • System.Speech.Synthesis
  • System.Speech.Synthesis.TtsEngine

Before you can use SpeechRecognitionEngine, you have to set up several properties and invoke some methods: in this case I guess, code sometimes says more than words ...

C#
// the recognition engine
SpeechRecognitionEngine speechRecognitionEngine = null;

// create the engine with a custom method (i will describe that later)
speechRecognitionEngine = createSpeechEngine("de-DE");

// hook to the needed events
speechRecognitionEngine.AudioLevelUpdated += 
  new EventHandler<AudioLevelUpdatedEventArgs>(engine_AudioLevelUpdated);
speechRecognitionEngine.SpeechRecognized += 
  new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);

// load a custom grammar, also described later
loadGrammarAndCommands();

// use the system's default microphone, you can also dynamically
// select audio input from devices, files, or streams.
speechRecognitionEngine.SetInputToDefaultAudioDevice();

// start listening in RecognizeMode.Multiple, that specifies
// that recognition does not terminate after completion.
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

In detail now, the function createSpeechEngine(string preferredCulture). The standard constructor and its overloads are the following:

  • SpeechRecognitionEngine(): Initializes a new instance using the default speech recognizer for the system.
  • SpeechRecognitionEngine(CultureInfo): Initializes a new instance using the default speech recognizer for a specified locale.
  • SpeechRecognitionEngine(RecognizerInfo): Initializes a new instance using the information in a RecognizerInfo object to specify the recognizer to use.
  • SpeechRecognitionEngine(String): Initializes a new instance of the class with a string parameter that specifies the name of the recognizer to use.

The reason why I was creating a custom function for instantiating the class is that I wanted to add the possibility to choose the language that the engine is using. If the desired language is not installed, then the default language (Windows Desktop Language) is used. Preventing an exception while choosing a not installed package. Hint: You can install further language packs to choose a different CultureInfo that is used by the SpeechRecognitionEnginge but as far as I know, it is only supported on Win7 Ultimate/Enterprise.

C#
private SpeechRecognitionEngine createSpeechEngine(string preferredCulture)
{
    foreach (RecognizerInfo config in SpeechRecognitionEngine.InstalledRecognizers())
    {
        if (config.Culture.ToString() == preferredCulture)
        {
            speechRecognitionEngine = new SpeechRecognitionEngine(config);
            break;
        }
    }

    // if the desired culture is not installed, then load default
    if (speechRecognitionEngine == null)
    {
        MessageBox.Show("The desired culture is not installed " + 
            "on this machine, the speech-engine will continue using "
            + SpeechRecognitionEngine.InstalledRecognizers()[0].Culture.ToString() + 
            " as the default culture.", "Culture " + preferredCulture + " not found!");
        speechRecognitionEngine = new SpeechRecognitionEngine();
    }

    return speechRecognitionEngine;
}

The next step is it to set up the used Grammar that is loaded by the SpeechRecognitionEngine. In our case, we create a custom text file that contains key-value pairs of texts wrapped in the custom class SpeechToText.Word because I wanted to extend the usability of the program and give you a little showcase on what is possible with SAPI. That is interesting because in doing so, we are able to associate texts or even commands to a recognized word. Here is the wrapper class SpeechToText.Word.

C#
namespace SpeechToText
{
   public class Word
   {           
       public Word() { }
       public string Text { get; set; }          // the word to be recognized by the engine
       public string AttachedText { get; set; }  // the text associated with the recognized word
       public bool IsShellCommand { get; set; }  // flag determining whether this word is an command or not
   }
}

Here is the method to set up the Choices used by the Grammar. In the foreach loop, we create and insert the Word classes and store them for later usage in a lookup List<Word>. Afterwards we insert the parsed words into the Choices class and finally build the Grammar by using a GrammarBuilder and load it synchronously with the SpeechRecognitionEngine. You could also simply add

string
s to the choices class by hand or load a predefined XML-file. Now our engine is ready to recognize the predefined words.

C#
private void loadGrammarAndCommands()
{
    try
    {
        Choices texts = new Choices();
        string[] lines = File.ReadAllLines(Environment.CurrentDirectory + "\\example.txt");
        foreach (string line in lines)
        {
            // skip commentblocks and empty lines..
            if (line.StartsWith("--") || line == String.Empty) continue;

            // split the line
            var parts = line.Split(new char[] { '|' });

            // add word to the list for later lookup or execution
            words.Add(new Word() { Text = parts[0], AttachedText = parts[1], 
                      IsShellCommand = (parts[2] == "true") });

            // add the text to the known choices of the speech-engine
            texts.Add(parts[0]);
        }
        Grammar wordsList = new Grammar(new GrammarBuilder(texts));
        speechRecognitionEngine.LoadGrammar(wordsList);
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

To start the SpeechRecognitionEngine, we call SpeechRecognitionEngine.StartRecognizeAsync(RecognizeMode.Multiple). This means that the recognizer continues performing asynchronous recognition operations until the RecognizeAsyncCancel() or RecognizeAsyncStop() method is called. To retrieve the result of an asynchronous recognition operation, attach an event handler to the recognizer's SpeechRecognized event. The recognizer raises this event whenever it successfully completes a synchronous or asynchronous recognition operation.

C#
// attach eventhandler
speechRecognitionEngine.SpeechRecognized += 
  new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);

// start recognition
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

// Recognized-event 
void engine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    txtSpoken.Text += "\r" + getKnownTextOrExecute(e.Result.Text);
    scvText.ScrollToEnd();
}

And here comes the gimmick of this application, when the engine recognizes one of our predefined words, we decide whether to return the associated text, or to execute a shell command. This is done in the following function:

C#
private string getKnownTextOrExecute(string command)
{
    try
    {   // use a little bit linq for our lookup list ...
        var cmd = words.Where(c => c.Text == command).First();

        if (cmd.IsShellCommand)
        {
            Process proc = new Process();
            proc.EnableRaisingEvents = false;
            proc.StartInfo.FileName = cmd.AttachedText;
            proc.Start();
            return "you just started : " + cmd.AttachedText;
        }
        else
        {
            return cmd.AttachedText;
        }
    }
    catch (Exception)
    {
        return command;
    }
}

That is it! There are plenty of other possibilities to use the SAPI for, maybe a Visual Studio plug-in for coding? Let me know what ideas you guys have! I hope you enjoyed my first article.

History

Version 1.0.0.0 release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Austria Austria
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionTrain with wav Pin
Member 808473613-Feb-13 15:46
Member 808473613-Feb-13 15:46 
AnswerRe: Train with wav Pin
Sperneder Patrick13-Feb-13 19:47
professionalSperneder Patrick13-Feb-13 19:47 
GeneralRe: Train with wav Pin
Member 808473614-Feb-13 4:39
Member 808473614-Feb-13 4:39 
GeneralRe: Train with wav Pin
Sperneder Patrick14-Feb-13 4:57
professionalSperneder Patrick14-Feb-13 4:57 
GeneralRe: Train with wav Pin
Member 808473615-Feb-13 6:39
Member 808473615-Feb-13 6:39 
Questionı got two messages Pin
rfatcakr19-Jan-13 17:54
rfatcakr19-Jan-13 17:54 
AnswerRe: ı got two messages Pin
Thomas Daniels21-Jan-13 7:08
mentorThomas Daniels21-Jan-13 7:08 
GeneralRe: ı got two messages Pin
Sperneder Patrick13-Feb-13 19:50
professionalSperneder Patrick13-Feb-13 19:50 
Hm, the exception exactly means what it says.
The culture that you have selected is not installed on this machine. (also check for typos!)
And the second part is weird?! I dont think that this exception is thrown from the engine or part of my example code....

Let me see what you've got and maybe i can tell you more.
Regards
Patrick
GeneralRe: ı got two messages Pin
Thomas Daniels13-Feb-13 22:05
mentorThomas Daniels13-Feb-13 22:05 
GeneralMy vote of 5 Pin
Member 963781231-Dec-12 6:05
Member 963781231-Dec-12 6:05 
GeneralMy vote of 5 Pin
amish kumar15-Oct-12 0:21
amish kumar15-Oct-12 0:21 
QuestionQuestion Pin
Samar_011-Oct-12 7:42
Samar_011-Oct-12 7:42 
AnswerRe: Question Pin
Sperneder Patrick1-Oct-12 22:47
professionalSperneder Patrick1-Oct-12 22:47 
GeneralMy vote of 5 Pin
zjp97222-Sep-12 20:30
zjp97222-Sep-12 20:30 
GeneralMy vote of 5 Pin
yoke17-Sep-12 9:51
yoke17-Sep-12 9:51 
GeneralRe: My vote of 5 Pin
Sperneder Patrick18-Sep-12 6:04
professionalSperneder Patrick18-Sep-12 6:04 
GeneralMy vote of 5 Pin
e_guevara16-Sep-12 7:08
e_guevara16-Sep-12 7:08 
GeneralRe: My vote of 5 Pin
Sperneder Patrick18-Sep-12 6:05
professionalSperneder Patrick18-Sep-12 6:05 
GeneralMy vote of 5 Pin
Christiaan Rakowski23-Aug-12 11:15
professionalChristiaan Rakowski23-Aug-12 11:15 
GeneralRe: My vote of 5 Pin
Sperneder Patrick18-Sep-12 6:05
professionalSperneder Patrick18-Sep-12 6:05 
Generalquery Pin
vaibhavbvp19-Jul-12 2:20
vaibhavbvp19-Jul-12 2:20 
GeneralRe: query Pin
Sperneder Patrick18-Sep-12 6:03
professionalSperneder Patrick18-Sep-12 6:03 
GeneralRe: query Pin
vaibhavbvp28-Sep-12 23:24
vaibhavbvp28-Sep-12 23:24 
GeneralMy Vote of 5 Pin
Vimal Panara2-Jun-12 0:15
Vimal Panara2-Jun-12 0:15 
GeneralRe: My Vote of 5 Pin
Sperneder Patrick18-Sep-12 6:04
professionalSperneder Patrick18-Sep-12 6:04 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.