Speaking ASP.NET Website

Spencer Kittleson

Rate me:

4.83/5 (8 votes)

6 Jun 2016CPOL2 min read

30K

1.3K

Text-to-speech in an ASP.NET MVC website - This tip shows how to setup a website to generate a text-to-speech MP3, then stream it for a browser client using HTML 5 audio controls.

Download source - 10.2 MB

Introduction

Text-to-speech in an ASP.NET MVC website - this tip shows how to setup a website to generate a text-to-speech MP3, then stream it for a browser client using HTML 5 audio controls.

Background

Finding a good text-to-speech implementation in ASP.NET was rather difficult for the requirements of my project. I was able to find enough forums and documentation to assemble a simple solution to generate text to speech MP3 audio to a browser client. The voice generator comes from the .NET Microsoft Speech Synthesizer. A WAV audio stream is created that is then passed through the Naudio Lame framework to be converted to a MP3 stream. Why a MP3 format versus the standard WAV file? MP3s are smaller in file size and play nicer with most modern browser clients.

Using the Code

I recommend downloading and running the project code attached to this tip to see a working example.

Prerequisites

Using IIS 7.5 or newer (It's also been tested on IIS 10 express)
Using application in integrated mode
Application pool identity of website needs to be Local System
MVC3 or newer
Reference System.Speech
Nuget packages:
- Naudio
- Naudio Lame

Place the proper references in the home controller.

using NAudio.Lame;
using NAudio.Wave;
using System;
using System.Globalization;
using System.IO;
using System.Speech.AudioFormat;
using System.Speech.Synthesis;
using System.Threading;
using System.Web;
using System.Web.Mvc;

Place the following method called TextToMp3 in the home controller. The only input is the text in which to be converted. The text gets converted to a WAV stream using the Microsoft speech synthesizer. The WAV stream is then converted to Mp3 stream using Naudio.Lame framework. The result is returned in bytes, as a FileResult, to the browser client. It can then be played via html5 audio controls.

public FileResult TextToMp3(string text)
            {
                //Primary memory stream for storing mp3 audio
                var mp3Stream = new MemoryStream();
                //Speech format
                var speechAudioFormatConfig = new SpeechAudioFormatInfo
                (samplesPerSecond: 8000, bitsPerSample: AudioBitsPerSample.Sixteen, 
                channel: AudioChannel.Stereo);
                //Naudio's wave format used for mp3 conversion. 
                //Mirror configuration of speech config.
                var waveFormat = new WaveFormat(speechAudioFormatConfig.SamplesPerSecond, 
                speechAudioFormatConfig.BitsPerSample, speechAudioFormatConfig.ChannelCount);
                try
                {
                    //Build a voice prompt to have the voice talk slower 
                    //and with an emphasis on words
                    var prompt = new PromptBuilder 
                    { Culture = CultureInfo.CreateSpecificCulture("en-US") };
                    prompt.StartVoice(prompt.Culture);
                    prompt.StartSentence();
                    prompt.StartStyle(new PromptStyle() 
                    { Emphasis = PromptEmphasis.Reduced, Rate = PromptRate.Slow });
                    prompt.AppendText(text);
                    prompt.EndStyle();
                    prompt.EndSentence();
                    prompt.EndVoice();

                    //Wav stream output of converted text to speech
                    using (var synthWavMs = new MemoryStream())
                    {
                        //Spin off a new thread that's safe for an ASP.NET application pool.
                        var resetEvent = new ManualResetEvent(false);
                        ThreadPool.QueueUserWorkItem(arg =>
                        {
                            try
                            {
                                //initialize a voice with standard settings
                                var siteSpeechSynth = new SpeechSynthesizer();
                                //Set memory stream and audio format to speech synthesizer
                                siteSpeechSynth.SetOutputToAudioStream
                                	(synthWavMs, speechAudioFormatConfig);
                                //build a speech prompt
                                siteSpeechSynth.Speak(prompt);
                            }
                            catch (Exception ex)
                            {
                                //This is here to diagnostic any issues with the conversion process. 
                                //It can be removed after testing.
                                Response.AddHeader
                                ("EXCEPTION", ex.GetBaseException().ToString());
                            }
                            finally
                            {
                                resetEvent.Set();//end of thread
                            }
                        });
                        //Wait until thread catches up with us
                        WaitHandle.WaitAll(new WaitHandle[] { resetEvent });
                        //Estimated bitrate
                        var bitRate = (speechAudioFormatConfig.AverageBytesPerSecond * 8);
                        //Set at starting position
                        synthWavMs.Position = 0;
                        //Be sure to have a bin folder with lame dll files in there. 
                        //They also need to be loaded on application start up via Global.asax file
                        using (var mp3FileWriter = new LameMP3FileWriter
                        (outStream: mp3Stream, format: waveFormat, bitRate: bitRate))
                            synthWavMs.CopyTo(mp3FileWriter);
                    }
                }
                catch (Exception ex)
                {
                    Response.AddHeader("EXCEPTION", ex.GetBaseException().ToString());
                }
                finally
                {
                    //Set no cache on this file
                    Response.Cache.SetExpires(DateTime.UtcNow.AddMinutes(-1));
                    Response.Cache.SetCacheability(HttpCacheability.NoCache);
                    Response.Cache.SetNoStore();
                    //required for chrome and safari
                    Response.AppendHeader("Accept-Ranges", "bytes");
                    //Write the byte length of mp3 to the client
                    Response.AddHeader("Content-Length", 
                    	mp3Stream.Length.ToString(CultureInfo.InvariantCulture));
                }
                //return the converted wav to mp3 stream to a byte array for a file download
                return File(mp3Stream.ToArray(), "audio/mp3");
}

The Naudio Lame DLL files need to be loaded into memory on application start. The code below will need to be added to the global.aspx.cs file.

    public static void CheckAddBinPath()
{
    // find path to 'bin' folder
    var binPath = Path.Combine(new string[] 
    	{ AppDomain.CurrentDomain.BaseDirectory, "bin" });
    // get current search path from environment
    var path = Environment.GetEnvironmentVariable("PATH") ?? "";

    // add 'bin' folder to search path if not already present
    if (!path.Split(Path.PathSeparator).Contains(binPath, StringComparer.CurrentCultureIgnoreCase))
    {
        path = string.Join(Path.PathSeparator.ToString
        	(CultureInfo.InvariantCulture), new string[] { path, binPath });
        Environment.SetEnvironmentVariable("PATH", path);
    }
}

In the same file, the Application_Start method should look like below with the CheckAddBinPath added to the bottom of the method.

protected void Application_Start()
{
    AreaRegistration.RegisterAllAreas();
    FilterConfig.RegisterGlobalFilters(GlobalFilters.Filters);
    RouteConfig.RegisterRoutes(RouteTable.Routes);
    BundleConfig.RegisterBundles(BundleTable.Bundles);

    //check for bin files to be loaded
    CheckAddBinPath();
}

Example of Use

On the home view, add the following HTML, JavaScript, and jQuery code.

HTML

<label for="inputText">Type it!</label><br />
        <textarea id="inputText" class="form-control" 
        rows="5" style="width:100%;"></textarea><br />
        <button id="playAudio" type="button" 
        class="btn btn-primary btn-lg btn-block">Say it!</button>
        <div id="divAudio_Player" class="hidden">
            <audio id="audio_player">
                <source id="audio_player_wav" src="@Url.Action
                ("PlayTextArea", "Home", 
                new { text = "type something in first" })" type="audio/mp3" />
                <embed height="50" width="100" 
                src="@Url.Action("PlayTextArea", "Home", 
                new { text = "type something in first" })">
            </audio>
        </div>

JavaScript

    $(function () {
    $('#playAudio').click(function () {
        var newUrl = '@Url.Action("PlayTextArea", "Home")?text='+ 
        encodeURIComponent($('#inputText').text())  + '×tamp=' + new Date().getTime();

        var new_audio = $(this).attr('rel');
        var source = '<audio id="audio_player">';
        source += '<source id="audio_player_wav" src="' + newUrl + '" type="audio/mp3" />';
        source += '</audio>';
            //play it
        setTimeout(function() {
                    $('#divAudio_Player').html(source);
            var aud = $('#audio_player').get(0);
            aud.play();
        }, 500);
    });
});

Add the following ActionResult to the home controller that will be used in this example:

public ActionResult PlayTextArea(string text)
{
    if (String.IsNullOrEmpty(text)) {
        text = "Type something in first";
    }
    return TextToMp3(text);
}

Run the project, type something in, and click "Say it!".

Points of Interest

Making any application speak has always been of interest to me. It adds usefulness as an application.

Know Issues

High CPU usage does occur when converting a WAV memory stream to MP3 stream.
Application identity would be preferred as a user in IIS but speech synthesizer needs to have a user profile.

References

Ricardo Peresm - http://weblogs.asp.net/ricardoperes/speech-synthesis-with-asp-net-and-html5
Stack Overflow - http://stackoverflow.com/questions/20088743/mvc4-app-unable-to-load-dll-libmp3lame-32-dll
Scott Mitchell - http://dotnetslackers.com/articles/aspnet/Range-Specific-Requests-in-ASP-NET.aspx

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Spencer Kittleson

Software Developer (Senior)

United States

Spencer is a software engineer, database designer, and general tinker of applications. Sometimes hardware too! He can’t decide whether Linux or Windows environment to use so he uses both consistently.

Been programming long enough for his peers to call him a professional "software engineer". Primarily a full stack .NET developer and frequently wanders over to NodeJs. Most days are spent developing WebApis, JavaScript, and building SPA sites.