You seem to lack some knowledge on DSP (Digital Signal Prepossessing) so I would suggest you start reading some books or online material to gain some background information on what you are working with here.
Normally we take the
Hamming[
^], Blackman or some other window function to filter input signal with voice in it. It is basically a specialty designed filter. And it is normally used together with
convolution of signals[
^].
If memory serves me correctly it was used the following way (assumeing that f(t) is the time dependent signal and Ham(t) is the Hamming window):
output = conv(f,Ham);
Once you have your output signal you can take the fft of it to get the spectrum. It is normal to take the log of the signal to get more human readable signal.To plot it together with a frequency you need to know the sample rate as well. Taken form
this[
^]:
nf=1024; %number of point in DTFT
Y = fft(y,nf);
f = fs/2*linspace(0,1,nf/2+1);
plot(f,log(1 + abs(Y(1:nf/2+1)))); %title('Single-Sided Amplitude Spectrum of y(t)')
xlabel('Frequency (Hz)')
ylabel('|Y(f)|')
Now to understand what you do when you actually take an fft of a signal, that you transform it to the frequency domain from the time domain. You can also go the other way from the frequency domain to the time domain with an ifft. However some information is lost on the way, so what you get is the Impulse Response, and not the original signal. So information got lost on the way.