Voip, Voice Over Internet Protocol

VoIP, Voice over Internet Protocol, is an emerging new technology that involves the routing of voice conversations over the Internet or any other IP – based network. It does not utilize traditional dedicated, circuit – switched voice transmission lines. Voice data flows over a general – purpose packet switched network
Problems associated with VoIP includes echoes, jitter, latency, packet loss.
Echoes are waves that have been reflected at points on the transmission medium with enough amplitude and time difference to be acknowledged as one different from the original wave transmission. There are two forms of echoes on voice networks, Hybrid and Acoustic Echo. Hybrid echo is an electrical signal reflection that occurs at the 4 –wire to 2 – wire conversion point in a PSTN network. It is caused by impedance matching. It enters a VoIP network whenever there is a connection between PSTN and VoIP networks. Acoustic echo on the other hand is non – linear and is usually caused by poor acoustic isolation between the speaker and microphone of a device.

It can gain entry into a VoIP network from any source. The major difference between the two types of echoes is that Hybrid echo is a property of the line connection and it remains mostly constant throughout the call while acoustic echo varies in latency depending on the environment of the echo source.

Source: www.ditechnetworks.com
Figure 1 :: Echo in a VoIP Network

It is usually greater in VoIP systems than in public switched telephone network (PSTN) because of the inherent delay in VoIP systems. The human ear has the ability to detect echo waves when the delay from the original wave is equal or close to 10ms. The intensity of the echo wave should also be beat least less than 25 or 30dB of the original wave. PSTN networks do not need echo cancellation on their networks because the echo is not delayed unlike VoIP networks.
In an attempt to reduce the effect of echoes on voice calls, low receiver volume was introduced. This led to callers having to shout into the mouthpiece for the user at the other end to repeat what was said. It served as a means of dealing with echoes before digital processing became possible. The voice signals heard by a listener were attenuated by the hardware. The echo goes through the attenuator twice and provided an avenue for echo reduction. It was however not satisfactory and abandoned when digital echo cancellation became possible.

David Mandelstam said that Digital echo cancellation is based on subtracting from the received signal a correction based on the response of the system to a short of sound, called the finite impulse response (FIR). The FIR is simply the echo one would hear from a short ping.
Latency is technically defined as the time it takes a packet to travel from its source to its destination. It is also known as delay. This problem is most profound in satellite connection because of the large distances the packets have to travel through. It is a common problem in slow and congested connections.
According to , callers notice roundtrip delay of 150ms or more. The human ear is also able to detect close to 250ms latencies. Initial works to curb this problem is to consider the devices used to route packets on the networks. Another solution is to reserve bandwidth for voice packets. This way, routers can know that the packets are in real time. Also, higher priority can also be given to voice packets.
Jitter refers to a measure of the variance over time of the latency across a network. The nature of packet – switched networks involves the splitting of information into packets and each of these packets can travel via different paths to its destination. In general, it is usually a problem with slow links or congestion.
According to , jitter between the source and destination should be less than 100ms. If jitter is less than this value, it can be solved. The primary solution found for jitter is the use of jitter buffer. It essentially assigns a buffer to receive the packets and give it to the receiver though with a trade off, a little delay. This buffer can be increased in IP phones, with a little consequence. If the buffer is increased, packet loss is reduced and more delay but if the buffer is reduced, packet loss is increased but there is little delay.
Packet Loss: as the name implies or connotes, means the loss of packets during transmission. The communication is based on UDP protocols which is connectionless and hence if any packet is lost, it is not sent again. In addition to this, package loss also occurs by discarding packets that do not arrive at the receiver on time. This problem is greater when the loss occurs in bursts. It is recommended that the highest rate of packet loss for voice to be heard clearly is 1%. This however, depends on the codec being used. If the codec compression is high, the effect will be dangerous.
The most effective technique is not to send silences especially in networks with low speed or with congestion.

Originally, Codecs referred to a COder/DECoder, but more recently the term seems to relate more to Compression/DECompression. They are algorithms installed as software or embedded within hardware and used to convert analog signals to a digital form. Many of them take into account the human brain’s ability to form an impression from incomplete information. Just like optical illusions, the voice-compression algorithms take advantage of our tendency to interpret what we believe we should hear, rather than what we actually hear. In VoIP, codecs are used to encode voice for transmission across IP networks. They generally provide a compression capability to save network bandwidth. Some also support silence suppression, where silence is not encoded or transmitted.

The basic functions of a Codec includes:
• Encoding – decoding
• Compression – decompression
• Encryption - Decryption
Encoding - decoding
During a conversation over a normal PSTN phone, the voice signals are transported in the same form over the phone line. This is not so with VoIP, your voice is converted into digital signals. This conversion is technically called encoding. When the digitized voice reaches its destination, it has to be decoded back to its original analog state so that the other correspondent can hear and understand it.
Compression – decompression
Bandwidth is a scarce commodity. Therefore, if the data to be sent is made lighter, one can send more in a certain amount of time, and thus improve performance. To make the digitized voice less bulky, it is compressed. Compression is a complex process whereby the same data is stored but using lesser space (digital bits). During compression, the data is confined to a structure (packet) proper to the compression algorithm. The compressed data is sent over the network and once it reaches its destination, it is decompressed back to it original state before being decoded. In most cases, however, it is not necessary to decompress the data back, since the compressed data is already in a ‘consumable’ state.

Types of compression
When data is compressed, it becomes lighter and hence performance is improved. However, it tends to be that the best compression algorithms decrease the quality of the compressed data. There are two types of compression: lossless and lossy. With lossless compression, one loses nothing, but one can’t compress that much. With lossy compression, you achieve great downsizing, but you lose in quality. You normally can’t get the compressed data back to its original state with lossy compression, since the quality had been sacrificed for size. But this is most of the time not necessary.
A good example of lossy compression is MP3 for audio. When you compress to audio, you can’t compress back, you MP3 audio is already very good to listen to, compared to huge pure audio files.
Encryption – decryption
Encryption is one of the best tools for achieving security. It is the process of changing data into such a state that it no one can understand. This way, even if the encrypted data is intercepted by unauthorized people, the data still remains confidential. Once the encrypted data reaches destination, it is decrypted back to its original form. Often, when data is compressed, it already is encrypted to a certain extent, since it is altered from its original state.

The table below list some of the most popularly used codecs used in VoIP and their characteristics.
G.711 Pulse Code Modulation (PCM) 64 8
G.726 40, 32, 24, 26 kbit/s adaptive differential pulse code modulation (ADPCM) 16/24/32/40 8 3.85
G.723.1 Dual rate speech coder for multimedia communication transmitting at 5.3 and 6.3 kbits/s 5.3/6.3 8 3.8 – 3.9
G.729A Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP) 8 8 3.92
GSM Regular¬Pulse Excitation Long-Term Predictor (RPE-LTP) 13 8
iLBC 18 13.3
Speex Variable 8, 16, 32 2.15 – 24.6
Table 3.1 Most popular codecs used in VoIP (Source: www.voipforo.com)

It is the fundamental codec of the PSTN. Two companding methods are used in this codec; μ – law and A – law. It is the base codec from which all of the others are derived.

This codec used to be known as G721 (obsolete) and it is one of the original compressed codecs. It offers quality close to G.711 but it uses only half the bandwidth. It does this by sending only enough information to describe the difference between the current sample and the previous one rather than the result of the quantization measurement. It is attractive because it does not require much computation from the system.

It is designed for low-bit rate speech. It has two bit rate settings: 5.3 kbps and 6.3 kbps. It is one of the codecs required for the compliance of the H.323 protocol. It is burdened by patents and thus requires licensing if used in commercial applications.

This codec delivers impressive sound quality as compared to the little bandwidth it uses. It also requires a license for it to be used. It is however popular and is supported on many different phones and systems.

It is the most preferred codec of Asterisk. It does not require a license for it to be used and it offers outstanding performance with respect to the demand it places on the CPU. It operates at 13kbps.

The internet Low Bitrate Codec (iLBC) provides a mix of low bandwidth use and quality and it is suited to sustain quality on lossy network connections. It is not an ITU codec unlike most of the others and thus may not be compatible with common IP telephones and commercial VoIP systems. It uses complex algorithms to achieve its high levels of compression.

It is a variable bit rate codec which means that it can change its bit rate to suit the network environment it finds itself. Speex can operate at anywhere from 2.15 to 22.4 kbps, due to its variable bit rate.

It was originally designed to provide an IP transport mechanism for video-conferencing. It has become the standard in IP-based video-conferencing equipment, and it briefly enjoyed fame as a VoIP protocol as well. H.323 was developed by the ITU in May of 1996 as a means to transmit voice, video, data, and fax communications across an IP-based network while maintaining connectivity with the PSTN. Since that time, H.323 has gone through several versions and annexes (which add functionality to the protocol), allowing it to operate in pure VoIP networks and more widely distributed networks. H.323 is a relatively secure protocol. Since H.323 uses the RTP protocol for media communications, it does not natively support encrypted media paths.

Asterisk is an open source. Converged telephony platform which is designed primarily to run on Linux.
