Area Sound Enhancement


With the advent of deep learning, speech recognition technology has improved to a practical level, and the use of voice user interfaces has expanded to various scenes. Current speech recognition technology can be used stress-free in environments where only one person is speaking to a smartphone or a smart speaker. However, the speech recognition system fails and does not distinguish between a target speaker and others in an environment where multiple people are speaking simultaneously, such as automatic input of reception records at customer service counters and rows of self-service terminals equipped with voice user interfaces. OKI is researching and developing the "area sound enhancement" technology to solve these voice user interface problems.


Area sound enhancement technology picks up sounds in the target area by placing multiple microphone arrays around the target area. With a normal microphone, it is difficult to pick up the speaker's voice when surrounding noise is large. Even if a microphone with directivity, such as a gun microphone, or a microphone array is used, noise coming from the direction of the target area will be collected. This technology crosses the directivities of two microphone arrays at the target area from different positions. The component commonly included in the directivity of each microphone array is estimated to be sound in the target area, and all other components are suppressed.

Area sound enhancement technology makes it possible for speech recognition systems to recognize only the voice of the person in the target area even if there are other speakers around the area, thus enables smooth communication at customer service counters and self-service terminals.

Illustration of Area Sound Enhancement

Illustration of Area Sound Enhancement Image

Normal Microphone

The voice of the speaker cannot be heard well due to the voices of other people and background noise.

OKI's "Area Sound Enhancement Technology"

OKI's "Area Sound Enhancement Technology" Image
The voice of the speaker can be heard clearly even if there are surrounding noises.

