Making a Voice Recorder on Windows Phone

Joel Ivory Johnson

4.92/5 (23 votes)

Mar 31, 2011

CPOL

14 min read

112748

5289

Demonstration of what needs to be done to make a voice recorder on Windows Phone 7 including converting the raw bytes from the recording into a WAVE file.

Download source code - 676 KB

Introduction

There's a popular set of questions that come up in the forums related to working with the microphone, making usable recordings from it, and a few other things. I'm sure the questions will come up again and I thought it would be of value to have an example to which I could refer when these questions came up. I've made this voice memo application, have put it through Marketplace certification, and made the source code available to all who wish to use it. Feel free to use this code in about any manner you want. If you want to use it in your own app, I'd appreciate receiving a message just to tell me that you've found the code useful. But if you don't, I won't hold it against you. This code is free of obligations. Though I highly discourage you from submitting it back to the Windows Phone Marketplace in unmodified form.

Screenshot

I've decidedly have not yet spent any effort in making this program interface pretty. This post is all about functionality and since I'm giving the code away, I didn't want to invest a lot into my graphic artist to only give the image assets away. As mentioned above, if you use the code, it's up to you to apply your own graphics. Since I've put this program together, I plan to make more changes to it next week (top priority: making the program look good!).

What Do I Need?

The only software you need to work with this code is a Windows PC and the Windows Phone Developer Tools from http://developer.windowsphone.com (a free download!). I'm using Visual Studio 2010 Ultimate. But the Express edition in the Developer Tools will work just as well.

Deciding on a Feature Set

Before starting on the application, I sat down and listed and prioritized the features that I wanted the application to have. In no particular order, some of the things I thought about included the following:

Ability to export recordings
Save recordings in WAV format
Add notes to recording
Speed up or slow down recordings
Change voice
Combine, split, and edit recordings
Export as MP3
Categorize memos
Time/Date activated reminders

As you can see, there's a lot of different things that one could add to a voice memo application. It quickly can progress from something simple to something complex. Rather than making the application overly complex, I chose a minimum feature set so that I could accomplish a primary goal of actually producing something to deliver that is simple enough such that I don't have a lot of potential places in which bugs could occur. The reduced feature set is as follows:

Save recordings in WAV format
Order recordings by date or by name
Record under lock screen
Add notes to recording

This is a simple beginning and something on which other features can be added later.

Using XNA classes from a Silverlight application

There are two types of applications that you can create on Windows Phone: those that make use of the Silverlight for their UI, and those that make use of XNA rendering classes for UI. You must exclusively use one type of UI presentation layer or the other. There's no way for you to use XNA rendering classes from a Silverlight application or vice versa. Silverlight offers several controls that can be used for building the application's UI from a designer such as buttons, text boxes, labels, and so on. Within XNA, you are responsible for building your own solution for presenting information. So I am using a Silverlight UI for this application.

For recording audio, I must make use of the Microphone class from Microsoft.Xna.Framework.Audio. While you can't use XNA rendering classes in a Silverlight application, you can use many of the other XNA classes. Use of the audio related XNA classes require that FrameworkDispatcher.Update() be called periodically. Rather than convolute your program logic with a timer calling this function, you can make use of an example ApplicationService that Microsoft provides for performing this same function. The class will take care of calling this function for you. The entirety of the class follows.

public class XNAFrameworkDispatcherService : IApplicationService
{
    private DispatcherTimer frameworkDispatcherTimer;

    public XNAFrameworkDispatcherService()
    {
        this.frameworkDispatcherTimer = new DispatcherTimer();
        this.frameworkDispatcherTimer.Interval = TimeSpan.FromTicks(333333);
        this.frameworkDispatcherTimer.Tick += frameworkDispatcherTimer_Tick;
        FrameworkDispatcher.Update();
    }

    void frameworkDispatcherTimer_Tick(object sender, EventArgs e)
         { FrameworkDispatcher.Update(); }

    void IApplicationService.StartService(ApplicationServiceContext context)
         { this.frameworkDispatcherTimer.Start(); }

    void IApplicationService.StopService() { this.frameworkDispatcherTimer.Stop(); }
}

Once the class is declared in your project, it needs to be added as an application lifetime object. There's more than one way to do this. But my preferred method is to add it to App.xaml.

<Application.ApplicationLifetimeObjects>
    <!--Required object that handles lifetime events for the application-->
    <shell:PhoneApplicationService 
        Launching="Application_Launching" Closing="Application_Closing" 
        Activated="Application_Activated" Deactivated="Application_Deactivated"/>
    <local:XNAFrameworkDispatcherService />        
</Application.ApplicationLifetimeObjects>

Having done this, I need not give FrameworkDispatcher.Update another thought; it will automatically be started when the program starts and automatically shutdown when the program ends.

Recording audio with the Microphone class

There's plenty of examples on the Internet on how to record audio on WP7. Unfortunately, many of them also contain the same bug. Before I present the code on how recording is implemented, I want to visually illustrate how recording works so that I can also demonstrate the bug.

The Microphone class records audio in chunks and passes each chunk back to your program while it continues to record a new chunk. To do this, the Microphone class has its own memory buffer that it will fill. Let's say you are recording the phrase "The quick brown fox jumped over the lazy dog." For now, let's also assume that the microphone's buffer happens to be able to record one word at a time (things generally don't end up falling so cleanly in real life, but I ask you to temporarily suspend your ability to apply that thought).

Visual representation of the relationship between the microphone, it's buffer, and your program

The microphone, its buffer, and your program

You begin speaking the phrase and the microphone's buffer gets filled with the sound of you saying the word "the."

The microphone has recorded the word 'the' but hasn't yet passed it to the program

Once the buffer is full, it gets passed to the program and the microphone begins filling a new buffer with the next word being recorded. The program receives the buffer and gets a chance to do something with it. Since this program is for saving and replaying recorded audio clips, the program will save the audio chunk and wait for the next chunk to be appended to the previous.

the program has received the first word and is currently recording the second

The program has received the first word as the second is being recorded.

As each chunk is recorded, it gets passed off to the program, and the program appends it to the chunks it has already received. The bug that many of the examples online have occurs when the user speaks the last word.

the program has received the first word and is currently recording the second

In many of the online examples, when the user has said the word dog and has pressed the Stop button, the program stops receiving further information from the microphone. But the last word hasn't been passed from the microphone buffer to the program yet! The end result is the program has received everything except the last word. To avoid this problem, what should have occurred is that when the user stops the recorder, instead of stopping immediately, the program should wait until it has received one more buffer before stopping. In a worst case scenario, there may be a few sounds after the end of the sentence that also gets recorded but that's better than missing data. One could reduce the amount of extra data that gets captured by reducing the size of the buffer.

Creating the code that does the above is fairly easy. To get an instance of the Microphone class, we can just grab it from Microphone.Current. When the microphone is recording, it will notify our program that a buffer is ready to be read by raising a BufferReady event. When this occurs, we can grab the buffer data by calling GetBuffer(byte[] destination). For this method, we must pass in a byte array that will receive the data. How big does this buffer need to be? The Microphone class has two other members that will help us identify the needed size. Microphone.BufferDuration will let us know how many seconds can be stored in the microphone's buffer and the method Microphone.GetSampleSizeInBytes(Timespan ) will tell us how many bytes are needed for a recording of a specific length. Bringing the two together, the size of the buffer we need can be found with Microphone.GetSampleSizeInBytes(Microphone.BufferDuration). Once you have an instance of the Microphone class, have subscribed to the BufferReady event, and have created the buffer for receiving your data, the recording process can be started by calling Microphone.Start().

In the event handler for BufferReady, there are a few things that need to be done. When the data is retrieved from the buffer, it needs to be accumulated some where. After the data has been accumulated, we need to check to see if a request to stop recording has been made. If it has, then tell the Microphone instance to stop sending data over using Microphone.Stop() and perform whatever actions are required to persist the recording. For accumulating the data, I will use a memory stream and then write it to isolated storage when the recording is completed. One of my requirements was that audio data would be saved in WAV format. This requirement is satisfied by writing a proper wave header before I write out all the bytes that were received. Rather than expounding on how to do that here, I refer you to a previous blog post that I have written on the subject. The code I have for doing all of the above follows:

public void StartRecording()
{
    if (_currentMicrophone == null)
    {
        _currentMicrophone = Microphone.Default;
        _currentMicrophone.BufferReady += 
           new EventHandler<EventArgs>(_currentMicrophone_BufferReady);
        _audioBuffer = new byte[_currentMicrophone.GetSampleSizeInBytes(
                            _currentMicrophone.BufferDuration)];
        _sampleRate = _currentMicrophone.SampleRate;
    }
    _stopRequested = false;
    _currentRecordingStream = new MemoryStream(1048576);
    _currentMicrophone.Start();
}

public void RequestStopRecording()
{
    _stopRequested = true;
}

void _currentMicrophone_BufferReady(object sender, EventArgs e)
{
    _currentMicrophone.GetData(_audioBuffer);
    _currentRecordingStream.Write(_audioBuffer,0,_audioBuffer.Length);
    if (!_stopRequested) 
        return;
    _currentMicrophone.Stop();

    var isoStore = 
      System.IO.IsolatedStorage.IsolatedStorageFile.GetUserStoreForApplication();

    using (var targetFile = isoStore.CreateFile(FileName))
    {
        WaveHeaderWriter.WriteHeader(targetFile, 
              (int)_currentRecordingStream.Length, 1, _sampleRate);
        var dataBuffer = _currentRecordingStream.GetBuffer();
        targetFile.Write(dataBuffer,0,(int)_currentRecordingStream.Length);
        targetFile.Flush();
        targetFile.Close();
    }
}

Code for recording from the microphone and saving to a file.

Audio playback

For playing back audio, I will make use of the SoundEffect class. Like the Microphone class, SoundEffect is an XNA audio class and requires the FrameworkDispatcher.Update() method to be called periodically. There are two ways I could go about loading the WAVE file. I could either decode the header myself or let the SoundEffect class do it. I show manual decoding here for reference should some one need to make other modifications to the file.

When instantiating a SoundEffect through its constructor, three items of data are needed: the recorded audio data, the sample rate, and the number of audio channels in the recording. This application will only be recording in monoral, not stereo. So there will always be one audio channel. I could get away with passing AudioChannels.Mono for this field. But in the future, I may add the ability to import recordings (which could be in stereo) so I'm going to pull this data from the Wave header. Likewise, I could also have grabbed the sample rate from the Microphone class instead of obtaining it from the Wave header. But in the interest of things I'm considering for the future, I will obtain it from the header as well. The Wave data itself is everything after the header. Once a SoundEffect is initialized, to play it, I must get a SoundEffectInstance instance and then call its Play method.

I don't think that I need to explain why I only want one recording to play at a time. So before playing a new audio clip, I check to see if there is an existing one loaded in memory and I stop it.

public void PlayRecording(RecordingDetails source)
{
    if(_currentSound!=null)
    {
        _currentSound.Stop();
        _currentSound = null;
    }
    var isoStore = System.IO.IsolatedStorage.IsolatedStorageFile.
                                 GetUserStoreForApplication();
    if(isoStore.FileExists(source.FilePath))
    {
        byte[] fileContents;
        using (var fileStream = isoStore.OpenFile(source.FilePath, FileMode.Open))
        {
            fileContents = new byte[(int) fileStream.Length];
            fileStream.Read(fileContents, 0, fileContents.Length);
            fileStream.Close();//not really needed, but it makes me feel better. 
        }
         
        int sampleRate =((fileContents[24] <<  0) | (fileContents[25] <<  8) | 
                         (fileContents[26] << 16) | (fileContents[27] << 24));

        AudioChannels channels = (fileContents[22] == 1) ? 
                        AudioChannels.Mono : AudioChannels.Stereo;

        var se = new SoundEffect(fileContents, 44, 
            fileContents.Length - 44, sampleRate, channels, 0,
                                    0);
        _currentSound = se.CreateInstance();
        _currentSound.Play();
    }
}

Loading the sound via SoundEffect.FromFile is simple and straightforward.

public void PlayRecording(RecordingDetails source)
{
    SoundEffect se;
    if(_currentSound!=null)
    {
        _currentSound.Stop();
        _currentSound = null;
    }
    var isoStore = System.IO.IsolatedStorage.
                     IsolatedStorageFile.GetUserStoreForApplication();
    if(isoStore.FileExists(source.FilePath))
    {
        byte[] fileContents;
        using (var fileStream = isoStore.OpenFile(source.FilePath, FileMode.Open))
        {
            se = SoundEffect.FromStream(fileStream);
            fileStream.Close();//not really needed, but it makes me feel better. 
        }

        _currentSound = se.CreateInstance();
        _currentSound.Play();
    }
}

Keeping track of the recordings

In addition to keeping the recordings in isolated storage, I wanted to keep track of some other things such as the date the recording was made, a title for the recording, and notes for the recording. It is possible to give the recording a title through the file name or inferring the recorded date from the date on the file, but that solution just doesn't seem durable; there are constraints on the characters that can appear in a file name and in the future, when I add the ability to import and export files, there could be loss of file dates. Instead I've made a class that will hold all the information I want to track on a recording. A simplified view of the class follows.

public class RecordingDetails
{
   public string    Title { get; set; }
   public string    Details { get; set; }
   public DateTime  TimeStamp { get; set; }
   public string    FilePath { get; set; }
   public string    SourcePath { get; set; }
}

I gave a simplified view in the interest of keeping the class easy to read. This class needs to be serializable so that I can read and write it from isolated storage. So the class is decorated with the [DataContract] attribute and the properties are decorated with the [DataMember] attribute. I also plan to bind instances of this class to UI elements. So this class needs to implement the INotifyPropertyChanged interface. The version of this class follows. It isn't as much typing as it looks. I use Visual Studio Snippets to automate generation of part of the code.

[DataContract]
public class RecordingDetails: INotifyPropertyChanged 
{
                
    // Title - generated from ObservableField snippet - Joel Ivory Johnson

    private string _title;
    [DataMember]
    public string Title
    {
    get { return _title; }
        set
        {
            if (_title != value)
            {
                _title = value;
                OnPropertyChanged("Title");
            }
        }
    }
    //-----

                
    // Details - generated from ObservableField snippet - Joel Ivory Johnson

    private string _details;
    [DataMember]
    public string Details
    {
    get { return _details; }
        set
        {
            if (_details != value)
            {
                _details = value;
                OnPropertyChanged("Details");
            }
        }
    }
    //-----

                
    // FilePath - generated from ObservableField snippet - Joel Ivory Johnson

    private string _filePath;
    [DataMember]
    public string FilePath
    {
    get { return _filePath; }
        set
        {
            if (_filePath != value)
            {
                _filePath = value;
                OnPropertyChanged("FilePath");
            }
        }
    }
    //-----

                
    // TimeStamp - generated from ObservableField snippet - Joel Ivory Johnson

    private DateTime _timeStamp;
    [DataMember]
    public DateTime TimeStamp
    {
    get { return _timeStamp; }
        set
        {
            if (_timeStamp != value)
            {
                _timeStamp = value;
                OnPropertyChanged("TimeStamp");
            }
        }
    }
    //-----

                
    // SourceFileName - generated from ObservableField snippet - Joel Ivory Johnson

    private string _sourceFileName;
    [IgnoreDataMember]
    public string SourceFileName
    {
    get { return _sourceFileName; }
        set
        {
            if (_sourceFileName != value)
            {
                _sourceFileName = value;
                OnPropertyChanged("SourceFileName");
            }
        }
    }
    //-----

                
    // IsNew - generated from ObservableField snippet - Joel Ivory Johnson

    private bool _isNew = false;
    [IgnoreDataMember]
    public bool IsNew
    {
    get { return _isNew; }
        set
        {
            if (_isNew != value)
            {
                _isNew = value;
                OnPropertyChanged("IsNew");
            }
        }
    }
    //-----

                
    // IsDirty - generated from ObservableField snippet - Joel Ivory Johnson

    private bool _isDirty = false;
    [IgnoreDataMember]
    public bool IsDirty
    {
    get { return _isDirty; }
        set
        {
            if (_isDirty != value)
            {
                _isDirty = value;
                OnPropertyChanged("IsDirty");
            }
        }
    }
    //-----


    public void Copy(RecordingDetails source)
    {
        this.Details = source.Details;
        this.FilePath = source.FilePath;
        this.SourceFileName = source.SourceFileName;
        this.TimeStamp = source.TimeStamp;
        this.Title = source.Title;
    }

    public event PropertyChangedEventHandler PropertyChanged;
    protected void OnPropertyChanged(string propertyName)
    {
        if (PropertyChanged != null)
        {
            PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
        }
    }

}

The [DataMember] attribute spread throughout the code is so that I can use data contract serialization to read and write this class. Since I'm using the DataContractSerializer, I don't have to concern myself much with the specifics of how this file will be encoded when it is saved and loaded. While using isolated storage isn't hard, I'm using a variant of a utility class from a previous blog entry to simplify serialization and deserialization to a few lines of code. When the user creates a new recording, a new instance of this class is also created. In addition to the title, notes, and time stamp, this class also contains a path to the recording that is described and contains a non-serialized member SourceFileName that contains the name of the original file from which this data had been loaded. Without that information, if the user decides to update the data, there is no way to know what file should be overwritten when the content is saved.

//Saving Data
var myDataSaver = new DataSaver<RecordingDetails>() {};
myDataSaver.SaveMyData(LastSelectedRecording, 
                       LastSelectedRecording.SourceFileName);

//Loading Data
var myDataSaver = new DataSaver<RecordingDetails>();
var item = myDataSaver.LoadMyData(LastSelectedRecording.SourceFileName);

With that, you have all the information that's needed to perform recording, save the recordings, and load the recordings. When the program first starts, I have it load all of the RecordingDetails and add them to an ObservableCollection on my View Model. From there, they can be bound to a list displayed to the user.

public void LoadData()
{
    var isoStore = 
        System.IO.IsolatedStorage.IsolatedStorageFile.GetUserStoreForApplication();
    var recordingList = isoStore.GetFileNames("data/*.xml");
    var myDataSaver = new DataSaver<RecordingDetails>();
    Items.Clear();
    foreach (var desc in recordingList.Select(item =>
                    {
                        var result =myDataSaver.LoadMyData(String.Format("data/{0}", item));
                        result.SourceFileName = String.Format("data/{0}", item);
                        return result;
                   }))
    {
        Items.Add(desc);
    }
    this.IsDataLoaded = true;
}

Saving state and tombstoning

Your program can be interrupted at any time by something like an incoming call or the user breaking away to do a search or some other action. When this happens, your application will get tombstoned; the OS will save which page the user was on and will give your program a chance to save other data. When the program is reloaded, the developer must ensure that steps are taken to properly reload state. For the most part, I didn't need to worry about tombstoning because most of the state data for the program is promptly persisted to isolated storage. And there's not much state data to be saved; recordings are immediately committed along with changes to the program's settings. If you want to learn more about tombstoning, I highly suggest that this not be your resource for exploring it.

So you can record and playback, now what?

There's plenty of memo recorders in the Marketplace. What is the purpose of making another? There are other sound related applications that can make use of the functionality implemented within this code. A voice memo recorder is not my end goal. My end goal actually isn't singular, there are a lot of applications that can be derived from this code. Right now, the source code phylogeny that I expect to result from this application is below.

Source Phylogeny

Source code phylogeny for potential derived apps.

Don't take the chart too literally. It's just to illustrate a concept. But as you can see from the above graphic, I could take this code, apply some changes, and produce something with a different purpose. If I added something to the program that applied a transformation to the recording, I would then have a voice changer. With a bit of Fourier analysis and some other code, I could produce something that could print out sheet music from a recording (note: I called it a transcriber above, but I am probably off on my terminology there).

Preparing for certification

Certification can take anywhere from a few hours to a few days. The minimal set of files you need when preparing an application for certification are the XAP containing your application (remember to do a release build!), at least one screenshot, and a few image icons in various sizes (200x200, 173x173, and 99x99 pixels). I won't cover the certification process here but will detail it in a later article. While you are waiting for certification, you may want to pass time by preparing a promotional page on your website. There's a standard set of images for referring some one to the Marketplace. You can grab the images from here and they come in various sizes, colors, and languages.

After your application passes certification, you'll be able to see the direct link to your app. In the case of this application, it is http://social.zune.net/redirect?type=phoneApp&id=268c6119-d755-e011-854c-00237de2db9e. Combined with the image, I've got a reconizable download link that I could put on a promotional page:

What's next?

I've put this code out as an example only. From here, I'll improve upon my own version of this application and I probably won't be updating the version in this article beyond minor bug fixes.

History

2010 March 31 - Initial publication.