Click here to Skip to main content
15,034,454 members
Articles / Desktop Programming / Win32
Article
Posted 27 May 2020

Tagged as

Stats

5.9K views
8 bookmarked

SpeechRecognition and SpeechSynthesis Windows 10 API for plain Win32

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
27 May 2020CPOL2 min read
A one-function library to easily integrate Speech to Text and Text to Speech in your Win32 applications
In Windows, we have the new SpeechRecognizer UWP API which, with a bit of code, can be used in plain Win32 applications. This article goes over how to use the library and which two functions the code exports, and how they are utilized.

Introduction

In the past years, many of us, sound engineers, tried to create and improve speech recognition algorithms. Lots of training, neural networks, cepstrum, fourier, wavelets, that sort of life-consuming research. Windows Speech API would try to implement such algorithms with minor success.

Now that the internet has grown so much in capacity and speed that it can hold and compare zillions of information, all those algorithms suddenly faded out in favor of network based voice recognition. Instead of local analysis, your voice is transmitted to a server which contains many, many samples and it is able to deduct, with great accuracy, your wording. Google is using that in Android already.

In Windows, we have the new SpeechRecognizer UWP API which, with a bit of code, can be used in plain Win32 applications. Here is a one-function library that will handle the details for you. I have been using this in my big audio and video sequencer, Turbo Play.

Using the Library

The code exports just two functions:

C++
HRESULT __stdcall SpeechX3(const wchar_t* t, std::vector<uint8_t>* tx, bool XML);
HRESULT __stdcall SpeechX1(void* ptr, SpeechX2 x2, 
                           const wchar_t* langx = L"en-us", int Mode = 0);

For text to speech, use SpeechX3 with the text, the vector to write out a WAVE file data. You may use XML markup to configure the synthesis.

For speech to text, use SpeechX1.

With Mode = 2, pass a std::vector<std::tuplestd::wstring,std::wstring>> as a ptr to get all languages supported.

C++
std::vector<std::tuple<std::wstring, std::wstring>> sx; 
SpeechX1((void*)&sx, 0, 0, 2); 
for (auto& e : sx) 
{ 
  std::wcout << std::get<0>(e) << L" - " << std::get<1>(e) << std::endl; 
}

The first tuple item is the display name of the language, the second is the code that you would pass again to the function later to initiate the speech recognition.

Once you have picked the language to use, call SpeechX1 again with mode = 0, ptr = a custom pointer to be passed to your callback. The third parameter is the picked language code and you will pass a callback:

C++
HRESULT __stdcall MyCallback(void* ptr, const wchar_t* reco, int conf);

which is called on three occasions:

  1. periodically to confirm the status with reco =  nullptr.
  2. with conf == -1 the recognition is pending hypothesis. Reco is the partial text recognized.
  3. with conf >= 0, the recognition is competed. Reco is the final text and the confidence parameter is from 0 to 3 (the lower, the better) to indicate the accuracy of the recognition.

Return S_OK to continue. If you return an error, SpeechX1 returns and the speech recognition session is ended.

With mode==1, the library tests the specific voice recognition engine without returning results (instead, you will hear from your speakers a playback of the recognized voice).

The library is provided as both DLL and static and a command line testing tool is included.
Have fun with it!

History

  • 27th May, 2020: First release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Michael Chourdakis
Software Developer
Greece Greece
I'm working in C++, PHP , Java, Windows, iOS, Android and Web (HTML/Javascript/CSS).

I 've a PhD in Digital Signal Processing and Artificial Intelligence and I specialize in Pro Audio and AI applications.

My home page: https://www.turbo-play.com

Comments and Discussions

 
QuestionYou did not use C++/CX or C++/winrt to write your UWP library? Pin
Shao Voon Wong12-Oct-20 18:28
mvaShao Voon Wong12-Oct-20 18:28 
AnswerRe: You did not use C++/CX or C++/winrt to write your UWP library? Pin
Michael Chourdakis26-Nov-20 8:13
mvaMichael Chourdakis26-Nov-20 8:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.