Click here to Skip to main content
15,889,116 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
actually my project is to read sound wave from the currently playing video and convert them to text
convert the audio in the song to text and disply it on the screen on the run time...so i need some help ....
Posted

Ooh, good luck with that. I have quite a bit of experience with speech recognition working with the spoken word, and I can tell you that just handling speech doesn't give you a great deal of accuracy. So, even if you manage to extract the speech component from audio, you will probably find that the speech recognition part does not give you accurate results as accents have a huge effect on comprehending the lyrics. Consider the song "Israelite[^]" by Desmond Dekker, he famously sang "oh oh, me Israelite". What many people heard, however, was "oh oh, me ears are alight".
 
Share this answer
 
Comments
ch.haya 29-Jan-14 12:09pm    
ok...thnx...but kindly guide me about its code actually i have code which convert a english sentence with SAPI API but unable for song kindly update following code ..

using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Windows.Forms; using EllisMIS.Audio.Transcription.Microsoft; namespace MicrosoftSpeechToTextExample { public partial class Form1 : Form { Dictation _transcriber; public Form1() { InitializeComponent(); } private void btnWavFile_Click(object sender, EventArgs e) { ///Not sure if a .Dispose is needed at all, but threw it in there. if (_transcriber != null) { _transcriber.Dispose(); } _transcriber = new Dictation(); SetEvents(); _transcriber.Start("example.wav"); } void _transcriber_SpeechHypothesizingEvent(object sender, System.Speech.Recognition.SpeechHypothesizedEventArgs e) { Console.WriteLine("Speech Recognizing: " + e.Result.Text); } void transcriber_SpeechRecognizedEvent(object sender, System.Speech.Recognition.SpeechRecognizedEventArgs e) { Console.WriteLine("Speech Recognized: " + e.Result.Text); } public void SetEvents() { _transcriber.SpeechRecognizedEvent -= new Dictation.SpeechRecognizedEventHandler(transcriber_SpeechRecognizedEvent); _transcriber.SpeechHypothesizingEvent -= new Dictation.SpeechHypothesizingEventHandler(_transcriber_SpeechHypothesizingEvent); _transcriber.SpeechRecognizedEvent += new Dictation.SpeechRecognizedEventHandler(transcriber_SpeechRecognizedEvent); _transcriber.SpeechHypothesizingEvent += new Dictation.SpeechHypothesizingEventHandler(_transcriber_SpeechHypothesizingEvent); } using System; using System.IO; using System.Speech.Recognition; namespace EllisMIS.Audio.Transcription.Microsoft { public class Dictation : IDisposable { #region Local Variables private SpeechRecognitionEngine _speechRecognitionEngine = null; private DictationGrammar _dictationGrammar = null; private bool _disposed = false; #endregion #region Constructors public Dictation() { ConstructorSetup(); } public Dictation(DictationGrammar targetGrammar) { _dictationGrammar = targetGrammar; ConstructorSetup(); } #endregion /// /// Start the transcriber using your default microphone. /// public void Start() { _speechRecognitionEngine.SetInputToDefaultAudioDevice(); StartSetup(); } /// /// Transcribe a .wav file /// /// <param name="targetWavFile"></param> public void Start(string targetWavFile) { if (!File.Exists(targetWavFile)) { throw new FileNotFoundException("Specified WAV file does not exist.", "targetWavFile"); } _speechRecognitionEngine.SetInputToWaveFile(targetWavFile); StartSetup(); } private void StartSetup() { if (_dictationGrammar == null) { _dictationGrammar = new DictationGrammar(); } _speechRecognitionEngine.LoadGrammar(_dictationGrammar); _speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple); _speechRecognitionEngine.SpeechRecognized -= new EventHandler(SpeechRecognized); _speechRecognitionEngine.SpeechHypothesized -= new EventHandler(SpeechHypothesizing); _speechRecognitionEngine.Sp
S Houghtelin 31-Jan-14 7:40am    
I don't think this person gets that spoken words and words with modulated pace and frequency are two different things. They keep reposting this request. If they do succeed I'll be the first to doff my cap to them.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900