Monday, June 16, 2008

Enhance VoIP telephony with HD Voice


By Daniel Hartnett
Infineon Technologies AG

Do you remember hearing FM radio for the first time, or listening to your first CD after years of scratched Vinyl? That's the experience high-definition (HD) sound brings to a telephone. As VoIP becomes commoditized, the focus of system developers and service providers shifts from providing VoIP to providing higher-quality VoIP.

Taking advantage of the strong marketing behind HDTV, HD-sound is now the accepted brand name for Wideband Voice. This allows service providers to offer superb and pristine audio quality over their IP phone-enabled home gateways. The traditional "narrowband" telephony was a compromise between speech intelligibility and data rates, providing an acoustic bandwidth of 300Hz to 3.4kHz. In contrast, HD-sound uses wideband technologies to offer a transmission range of 50Hz to 7 kHz or beyond.

The result is significantly increased intelligibility and a much more natural sound not only for voice conversation, but also for a range of other audio applications, such as MP3 and Internet radio. This article attempts to address the hurdles associated with delivering HD performance in telephony, and explore its market potential.

Wideband telephony
"Wideband" telephony specifies a transmission range of 150Hz to 6.3kHz. While this is not CD bandwidth (20Hz up to 20kHz), the increased bandwidth compared to narrowband offers significantly improved intelligibility.

Wideband telephony was standardized for ISDN with the G.722 codec about 20 years ago, but never really enjoyed wide deployment. G.722 however did make its way into journalism, where wideband with G.722 is often used for voice transmission from remote locations as an alternative to the poor quality of standard telephones.

As IP-phones already have powerful signal processing capabilities for narrowband speech compression algorithms, wideband codecs can easily be handled by the voice engines within IP-phones. If the ADCs and DACs support a 16kHz sampling rate, wideband telephony on an IP-phone comes with relatively low additional overhead.Another factor driving the development of wideband telephony is the new DECT standard CAT-iq, which also specifies G.722 as the required codec for HD Voice.

PC soundcards support 8- and 16-, 32-, 44.1- and 48kHz samplings rates, and generally have the necessary processing power for wideband codecs. PC-based soft-phone applications like Skype already have a huge footprint in the market.

Most enterprise IP-Phones like Siemens' OpenStage series already support wideband. The enterprise market for wideband is an excellent proof of concept as it is much easier to control the hard and software running on the end points. The deployment of HD voice in the residential space is much more difficult. Wideband requires that both parties in a call have wideband capable hardware and that the phone immediately shifts up to the best codec available.

In the past VoIP had to contend with a less than solid reputation. From its early days where only brave pioneers would make a connection over the internet, broadband users have been fast to take up the offerings of new players on the voice service provider market. The traditional trade-off was quality against price.

Today, VoIP quality has improved beyond recognition and is easily comparable to that of POTS services. As available bandwidth and processing power of customer premise equipment becomes the norm, the possibility of using more bandwidth for vastly improved voice quality is very real and imminent. This is where providers can differentiate their services.

HD VoIP
VoIP is not just VoIP. HD Sound makes it marketable above and beyond price. A POTS phone call is thin and almost monotone in comparison to a well-implemented HD Sound call. This leads to a "warmer" sounding phone call, where all the nuances of our voice are captured. Mistaking "s" for "f" is now a thing of the past. The possibilities that this brings are manifold. The hurdles associated with bringing it to a wide audience are also considerable.

To optimize their Wideband implementations, it is vital for phone manufacturers (fixed and cordless) to adhere to some important rules: The electro-acoustic components, especially the handset receiver or the hands-free loudspeaker have to be able to reproduce the whole wideband frequency range with low distortion and high fidelity in their respective mountings.

This poses huge challenges to the device designers, especially for devices with a small form factor like cordless or mobile phones. First-class voice quality does come at a price, but one assumes that a mass market for the application will regulate this.

On the speakerphone side, the following is key. It is advisable to encase the speakerphone in order to avoid echo within the housing and to emphasize the lower frequencies like a home Hi-Fi speaker, which is also completely sealed.

In any VoIP phone (narrow or wideband), delay is the most difficult hurdle to overcome in the quest for full-duplex performance. The human ear is insensitive for the echo that immediately follows the spoken word. Otherwise you would always hear a strong echo inside any given room.

But the higher the delay between one's speech and its echo, the more sensitive the ear becomes. That's why you always hear echo in a church. In a standard IP network packet delays of more than 100ms are possible—that's one BIG church.

For this reason additional effort has to be spent to reduce echo. The echo cancellation inside a phone behaves like the human ear. It cancels echo by estimating, calculating and subtracting the result from the microphone signal. This can be a difficult job as it must work in any environment where a phone can reside.

Added markets
HD Voice opens a myriad of possibilities for system vendors and service providers to access new markets.

Interactive voice response: Can you imagine trying to book a flight with the aid of a call service using pre-recorded voice samples? Hardly. Today's voice-activated services mainly serve to drive people mad, unable to understand even the slightest delta to the trained version of the word.

With Wideband the nuances in the human voice can be captured more easily and make voice-activated services a viable market with huge potential. Not only could we upgrade our broadband or phone services without actually speaking to anybody but booking a flight, a hotel or a train all become real possibilities.

Speech recognition systems will also benefit from increased bandwidth and provide a better recognition rate, especially because sibilants can be recognized much better. (Sibilant is the "s" sound we make when we talk - in this respect the letter "f" is often mistaken for "s" in a narrowband call)

A text-to-speech (TTS) system converts normal language text into speech (using synthesized speech). The quality of a speech synthesizer is judged by its similarity to the human voice, and by its ability to be understood. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a telephone or PC.



Automatic translation: Voice samples are translated to text in real time

Automotive speech recognition: Uses voice to command various functions in a car (wipers, radio, windows etc. not to drive it though!)

Speech Biometric Recognition: Speaker dependant Authentication. Possible applications could be in workplaces or anywhere that requires some sort of identification.

Dictation

Hands-free computing: Speech recognition for commands on a PC for disabled users.

Home automation: Uses voice to command things we usually need a switch to do. E.g. Close the shutters, turn off the lights, turn on the heating

Medical transcription : The practice of Modern Medicine dictates that physicians spend more time serving patient needs than creating documents in order to make financial ends meet. More modern methods of document creation are being implemented through the technology of computers and the internet. Voice Recognition (VR) is one of these new-age technologies. With the power to write up to 200 words per minute with 99 percent accuracy Voice Recognition has freed physicians from the shackles of traditional transcription services.

Web Radio on a cordless VoIP phone: The Bandwidth provided by today's broadband connections is more than adequate to drive Wideband down to the residential end user. To this end the DECT forum has initiated CAT-iq (Cordless Advanced Technology -Internet and Quality), a new cordless phone standard to tap into the potential of Wideband in VoIP end points.

Several steps are envisaged:

HD Voice in Cordless Phones: Vendors are striving to bring new products to the market that support HD voice. As discussed earlier this means upgrading the phones to include improved microphones and speakerphones to get the most out of the wideband codec.

Conferencing in Wideband quality: With improved hardware, it will be possible to add new features like 3 party conferencing in pristine quality, bringing a whole new experience to the user.

Web Radio: As part of rolling out new services, future CAT-iq products will support things Like News-Tickers and more noticeably Web Radio in HD quality. This promises to be the killer application for VoIP in the home, marrying the power of the Internet with HD audio quality. Now Irish people in Australia can listen to Radio Cork and Chinese in Munich can listen to Shanghai FM without launching their PC down in the basement.

Streaming audio content: CAT-iq will enable Cordless equipment vendors and service providers to enter markets previously the domain of the Hi-Fi specialists. Audio speakers containing a DECT receiver would be the perfect solution to distribute audio content around the home and even between different floors in the home. Not only is the air interface stable but also has optimal power consumption for this application.



No comments:

Google