Introduction
This is a proprietary VoIP project to send and receive audio data by TCP. It's an extension of my first article Play or Capture Audio Sound. Send and Receive as Multicast (RTP). This application streams the audio data not by multicast but by TCP. So you can be sure there is no data lost and you can transfer them over subnets and routers away. The audio codec is U-Law. The sample rate is selectable from 5000 to 44100.
The server can run on your local PC. You can get your current IP4-Address with help of running cmd.exe and typing "ipconfig
". You should use a static IP-Address, so that possible clients do not have to change their settings after reconnecting some days later. The clients must connect to the IP4-Address and port configured on the running server. The server can be run in silent mode (no input, no output) just transferring audio data between all connected clients.
Choose a free port that is not used by another application (do not use 80 or other reserved ports). You can connect within LAN or Internet. For Internet chatting, you can configure a port forwarding on your router.
Note !!! This is a proprietary project. You can't use my servers or clients with any other standardized servers or clients. I don't use standards like RTCP or SDP.
Background
Because of network traffic and time clock differences, you have to use Jitter-Buffers to compensate data transfer. You can set the Jitter-Buffer for each server, so all clients will use the same amount. One Jitter-Buffer represents one data-packet, included in a TCP-Stream. The server starts playing, when the Jitter-Buffer reaches the half of maximum. You can watch this in the progressbar
which is shown for each client. The more Jitter-Buffers you set, the more delay will occur. You can run the TCPStreamer
as client or as server. One server can handle one or more clients.
The TCPStreamer as Client
Running as client, you can connect to a server instance. Choose your microphone and listen device. Click on the microphone or speaker buttons to mute. After the client is connected, the speaker combobox changes to a progress bar showing the value of incoming data. The used SamplesPerSecond
(Quality) depends on Server-Configuration. The Jitter-Buffer client sides is only important for the delay of incoming data.
The TCPStreamer as Server
Running as a server, you can wait for one or more clients. Choose your microphone and listen device if wanted, but you can run the server without hearing or speaking server sides, so that only the clients speak to each other. Each client can be muted exclusive (Speaker and Micro). The IP-Address must be the address of your computer. The port number should not be used by other applications. The Jitter-Buffer value server sides is important for the delay of all connected clients. Use the lowest value as possible. The server has to mix all data from all clients, so you should choose a performant workstation on which to run the server. The quality of the speak depends on the SamplesPerSecond
value.
Using the Code
There are the following assemblies:
- TCPStreamer.exe (main application)
- TCPClient.dll (TCP client wrapper helper)
- TCPServer.dll (TCP server wrapper helper)
- WinSound.dll (sound recording and playing)
I could send data direct from soundcard to network. But I decided to put them into a Jitter Buffer first, because some sound devices (especially on laptops) are not able to get the sound data in equal time periods. With a Jitter Buffer, I ensure sending data every 20 ms. But the disadvantage is a bigger delay. The amount for that not configurable Jitter Buffer is 8, for a lower delay, you can reduce that value in the source code (RecordingJitterBufferCount
). But watch your soundcard for some minutes, if it can handle this.
private void OnDataReceivedFromSoundcard_Server(Byte[] data)
{
int bytesPerInterval = WinSound.Utils.GetBytesPerInterval((uint)
m_Config.SamplesPerSecondServer,
m_Config.BitsPerSampleServer, m_Config.ChannelsServer);
int count = data.Length / bytesPerInterval;
int currentPos = 0;
for (int i = 0; i < count; i++)
{
Byte[] partBytes = new Byte[bytesPerInterval];
Array.Copy(data, currentPos, partBytes, 0, bytesPerInterval);
currentPos += bytesPerInterval;
WinSound.RTPPacket rtp = ToRTPPacket(partBytes,
m_Config.BitsPerSampleServer, m_Config.ChannelsServer);
m_JitterBufferServerRecording.AddData(rtp);
}
}
When creating a RTP packet, most information like CSRC Count or Version are the same. After every sent RTP packet, I have only to increase the SequenceNumber
and Timestamp
. Before this, I translate the linear data to a compressed U-Law format to avoid network traffic.
private WinSound.RTPPacket ToRTPPacket(Byte[] linearData, int bitsPerSample, int channels)
{
Byte[] mulaws = WinSound.Utils.LinearToMulaw(linearData, bitsPerSample, channels);
WinSound.RTPPacket rtp = new WinSound.RTPPacket();
rtp.Data = mulaws;
rtp.CSRCCount = m_CSRCCount;
rtp.Extension = m_Extension;
rtp.HeaderLength = WinSound.RTPPacket.MinHeaderLength;
rtp.Marker = m_Marker;
rtp.Padding = m_Padding;
rtp.PayloadType = m_PayloadType;
rtp.Version = m_Version;
rtp.SourceId = m_SourceId;
try
{
rtp.SequenceNumber = Convert.ToUInt16(m_SequenceNumber);
m_SequenceNumber++;
}
catch (Exception)
{
m_SequenceNumber = 0;
}
try
{
rtp.Timestamp = Convert.ToUInt32(m_TimeStamp);
m_TimeStamp += mulaws.Length;
}
catch (Exception)
{
m_TimeStamp = 0;
}
return rtp;
}
private void OnJitterBufferServerDataAvailable(Object sender, WinSound.RTPPacket rtp)
{
Byte[] rtpBytes = rtp.ToBytes();
List<NF.ServerThread> list = new List<NF.ServerThread>(m_Server.Clients);
foreach (NF.ServerThread client in list)
{
if (client.IsMute == false)
{
client.Send(m_PrototolClient.ToBytes(rtpBytes));
}
}
}
In order to send and receive data by TCP, I use a simple proprietary protocol. Before each data block, I write a 32 bit data length information. So later, when I receive the data stream, I know how to interpret the data.
public Byte[] ToBytes(Byte[] data)
{
Byte[] bytesLength = BitConverter.GetBytes(data.Length);
Byte[] allBytes = new Byte[bytesLength.Length + data.Length];
Array.Copy(bytesLength, allBytes, bytesLength.Length);
Array.Copy(data, 0, allBytes, bytesLength.Length, data.Length);
return allBytes;
}
The reverse path is to get the data by network in this case for every connected client. In the first step, I have to extract the packets from the whole stream with the help of my own protocol.
private void OnServerDataReceived(NF.ServerThread st, Byte[] data)
{
if (m_DictionaryServerDatas.ContainsKey(st))
{
ServerThreadData stData = m_DictionaryServerDatas[st];
if (stData.Protocol != null)
{
stData.Protocol.Receive_LH(st, data);
}
}
}
With the help of the length information, I know when a packet starts and ends.
public void Receive_LH(Object sender, Byte[] data)
{
m_DataBuffer.AddRange(data);
if (m_DataBuffer.Count > m_MaxBufferLength)
{
m_DataBuffer.Clear();
}
Byte[] bytes = m_DataBuffer.Take(4).ToArray();
int length = (int)BitConverter.ToInt32(bytes.ToArray(), 0);
if (length > m_MaxBufferLength)
{
m_DataBuffer.Clear();
}
while (m_DataBuffer.Count >= length + 4)
{
Byte[] message = m_DataBuffer.Skip(4).Take(length).ToArray();
if (DataComplete != null)
{
DataComplete(sender, message);
}
m_DataBuffer.RemoveRange(0, length + 4);
if (m_DataBuffer.Count > 4)
{
bytes = m_DataBuffer.Take(4).ToArray();
length = (int)BitConverter.ToInt32(bytes.ToArray(), 0);
}
}
}
Before playing the data to soundcard, I put them into a further Jitter Buffer. This is necessary because of the irregular network traffic, especially over internet. The more the amount of Jitter Buffer, the more the delay.
private void OnProtocolDataComplete(Object sender, Byte[] bytes)
{
WinSound.RTPPacket rtp = new WinSound.RTPPacket(bytes);
if (rtp.Data != null)
{
JitterBuffer.AddData(rtp);
}
}
Finally, the data are ready to be played to soundcard. Before that, I translate the U-Law data back to linear data, because a sounddevice can only play linear one.
private void OnJitterBufferDataAvailable(Object sender, WinSound.RTPPacket rtp)
{
if (IsMuteAll == false && IsMute == false)
{
Byte[] linearBytes = WinSound.Utils.MuLawToLinear(rtp.Data, BitsPerSample, Channels);
Player.PlayData(linearBytes, false);
}
}
I implemented my own Jitter Buffer as a queue of RTP packets. The data can be added and then is handled by a high frequently timer function (20 ms).
public void AddData(RTPPacket packet)
{
if (m_Overflow == false)
{
if (m_Buffer.Count <= m_MaxRTPPackets)
{
m_Buffer.Enqueue(packet);
}
else
{
m_Overflow = true;
}
}
}
The Jitter Buffer handles the data every 20 milliseconds. To get such an exact timer, you can't use the normal .NET Timers. So I used the timer functions from the Win32 kernel32 and Winmm library. Before starting a timer, I set the precision as best as the system can offer. This can differ from 1 to more milliseconds. Better than 1 millisecond is not possible with Windows.
[DllImport("Kernel32.dll", EntryPoint = "QueryPerformanceCounter")]
public static extern bool QueryPerformanceCounter(out long lpPerformanceCount);
[DllImport("Kernel32.dll", EntryPoint = "QueryPerformanceFrequency")]
public static extern bool QueryPerformanceFrequency(out long lpFrequency);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeSetEvent")]
public static extern UInt32 TimeSetEvent(UInt32 msDelay, UInt32 msResolution,
TimerEventHandler handler, ref UInt32 userCtx, UInt32 eventType);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeKillEvent")]
public static extern UInt32 TimeKillEvent(UInt32 timerId);
[DllImport("kernel32.dll", EntryPoint = "CreateTimerQueue")]
public static extern IntPtr CreateTimerQueue();
[DllImport("kernel32.dll", EntryPoint = "DeleteTimerQueue")]
public static extern bool DeleteTimerQueue(IntPtr TimerQueue);
[DllImport("kernel32.dll", EntryPoint = "CreateTimerQueueTimer")]
public static extern bool CreateTimerQueueTimer(out IntPtr phNewTimer, IntPtr TimerQueue,
DelegateTimerProc Callback, IntPtr Parameter, uint DueTime, uint Period, uint Flags);
[DllImport("kernel32.dll")]
public static extern bool DeleteTimerQueueTimer(IntPtr TimerQueue,
IntPtr Timer, IntPtr CompletionEvent);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeGetDevCaps")]
public static extern MMRESULT TimeGetDevCaps(ref TimeCaps timeCaps, UInt32 sizeTimeCaps);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeBeginPeriod")]
public static extern MMRESULT TimeBeginPeriod(UInt32 uPeriod);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeEndPeriod")]
public static extern MMRESULT TimeEndPeriod(UInt32 uPeriod);
The Jitter Buffer is designed to handle the data, when half of the maximum is reached. After an overflow or underflow, the buffer tries to get back to this value.
private void OnTimerTick()
{
if (DataAvailable != null)
{
if (m_Buffer.Count > 0)
{
if (m_Overflow)
{
if (m_Buffer.Count <= m_MaxRTPPackets / 2)
{
m_Overflow = false;
}
}
if (m_Underflow)
{
if (m_Buffer.Count < m_MaxRTPPackets / 2)
{
return;
}
else
{
m_Underflow = false;
}
}
m_LastRTPPacket = m_Buffer.Dequeue();
DataAvailable(m_Sender, m_LastRTPPacket);
}
else
{
m_Overflow = false;
if (m_LastRTPPacket != null && m_Underflow == false)
{
if (m_LastRTPPacket.Data != null)
{
m_Underflow = true;
}
}
}
}
}
This project does not use overheaded libraries or extensions, so it can be used to learn the basics of manipulating sound data and network operations. Feel free to extend and improve it for your needs.
History
- 31.05.2012 - Added
- 03.05.2013 - Added duplex connections. Removed File-Player
- 09.05.2013 - Changed tip to article
- 12.12.2013 - Added communication between all clients
- 18.12.2013 - Fixed some bugs
- 22.04.2014 - Solved possible stability problems