Before the Dawn of IP Telephony - Part 6Originality and ingenuity for establishing voice quality
These contents translated a serialization article carried by ITPro IP telephony ONLINE published by Nikkei Business Publications, Inc. Jump to the original (Japanese).

Shinji Usuba
General Manager
eSound Venture Unit
Oki Electric Industry Co., Ltd
At the end of 1996, we finally reached the stage of VoIP product shipment. The technologies that had a major impact on subsequent implementations are introduced here.
Voice communication using packets — or Internet telephony — already existed in 1996, when we developed VOICEHUB.
The Internet phone tried by some fanatics using a PC for practical use hardly lived up to its name due to poor performance. It was basically laughed at as a toy among communication carriers and vendors. The majority thought that it was hardly useful as a telephone. Negative views were especially held by Japanese domestic vendors, and there were no other vendors attempting to fully dive into voice communication using IP (VoIP).
Realizing quality that is passable as a telephone was a critical issue for convincing VoIP to the world and making the first step toward its dissemination. The drop in communication cost using VoIP was an innovative achievement and the potential for completely changing communication as we know it in the near future was very appealing. Regardless of the low cost, however, VoIP would never be accepted as a product or market if minimal quality could not be met.

Photo 1 : BS1100-VOICEHUB, the first product that was completed at the end of 1996
Establishing voice quality as a telephone was the absolute-must condition when we developed our first product, "BS1100-VOICEHUB" (photo 1). BS1200 — the successor of BS1100 — that was developed next received a number of awards in the U.S. in the field of Internet telephony at the time. Our approach for establishing voice quality that has been inherited following the development of BS1100 is introduced here, along with some explanation of technical points.
Basic concept of establishing voice quality
In order to establish voice quality, we assumed the three items below to be necessary.
- No voice intermittence
The main reason why sending voice using IP never became practical is due to intermittence at the beginning or in the middle of a voice communication. Such intermittence is caused by jittering, or the variation of time taken for data to reach the other party. This problem is unavoidable with IP. Obviously, voice is extremely difficult to hear with such intermittence and communication is virtually impossible. Hence, we first placed maximum weight on the development that guarantees continuity of communication.
- Suppress unnecessary latency
Latency causes difficulty in communication as much time is taken for voice to reach the other party. Hence, latency should be kept as short as possible. However, shortening latency causes the intermittence described in (1) to occur more often. There is the need to keep latency to a minimum under the favorable condition of no jittering, and guarantee continuance of communication even in the case jittering occurs.
- Completely eradicate echoes
Echoes seriously deteriorate the quality of communication. The need for completely suppressing echoes goes without saying. The difficult task is to realize this while maintaining (1) and (2).
Since (1) ~ (3) are closely related as shown above, they must be treated as a set. Realizing these three items as a single entity was our basic concept during the development of our first product, BS1100-VOICEHUB, and remains so today in our latest products.
Buffer control
Using a large buffer to absorb jitters will stop voice intermittence even when there is jittering. As a side effect, however, there is larger latency. A smaller buffer means less latency, but more chance of intermittence.
As a result, we decided to use an extremely large buffer to absorb jitters and change the size of the buffer depending on the amount of jittering. This allows communication with hardly any latency, given the right network conditions. Voice can be played back without intermittence even if jitters increase and latency occurs.
Here's an example. There is high communication traffic of 10 minutes immediately before the end of business hours, causing increase in jittering. During the other hours, however, there is hardly any jittering. With the buffer control method, communication is possible without latency during the hours other than the 10 minutes prior to closing of business hours. Even during the 10 minutes, continuity of communication is guaranteed despite the presence of latency. One might think 10 minutes during a 24-hour period is less than 1% of the total. However, poor deterioration in quality during the mere 10 minutes may cause major complaints and VoIP would never be used again in place of a telephone. We made repeated assessments using real environments with this firmly in our minds.
Selection of a codec
A codec is a method used to convert voice to digital data. Various methods have been researched and standardized.
LD-CELP and CS-ACELP were the remaining candidates as the selection of codec when considering compression of bandwidth and latency.
At the time, LSI for 8kbit/sec LD-CELP was already completed at OKI. If my memory serves me correct, the standardization of CS-ACELP was only at the drafting stage.
In the case of IP networks, there is great possibility of packet loss. Even if processing requires more time, jittering on the network side is much, much greater. We decided that CS-ACELP was more suitable for our concept of no sound intermittence.
Selection of a codec was a crucial theme requiring many hours of deliberation with the integrator. Fortunately, there were many engineers familiar with encoding in our Transmission Department, Semiconductor Department and research laboratory that the right decision was made over a short period of time while exchanging opinions with them. Since CS-ACELP became the standard in the world of IP later on, I assume we made the right decision at the time.

Photo 2 : "OKI Technical Review" issue 174 announcing the technology for the first time.
After the shipment of BS1100 in December 1996, a thesis on the concept and technology of this product was completed and announced outside the company (photo 2). At the time, we didn't think that the technology would be widely recognized. Actually, the technology probably gained very little recognition here in Japan. However, surprising news came in.
... To be continued