Return to DDVPC HomePage Return to MELP page

Recommendations for the Correct Implementation of the 2400 bit per second Mixed Excitation Linear Prediction Voice Coding Algorithm

United States Department of Defense Digital Voice Processing Consortium

Version 1

30 June 1998

1. Introduction

The voice coding algorithm is not the only factor contributing to the quality of the voice in communications equipment. The proper implementation of any voice coding algorithm into communications equipment involves careful attention to many factors. If implementation details are not properly addressed, the output voice performance (e.g. quality, intelligibility, recognizability) could be severely degraded.

This document is an attempt to address these factors and provide guidance which will result in the highest voice performance possible for the 2400 bps Mixed Excitation Linear Prediction (MELP) voice coding algorithm.

2. Preprocessing

The MELP algorithm was evaluated over a wide range of acoustic noise environment input conditions using various high quality and noise-canceling microphones. The algorithm performed very well without the use of external preprocessing techniques such as automatic gain control or frequency domain pre-emphasis. Careful consideration of the effects of preprocessing techniques across all expected input conditions should be given before use on MELP input.

3. Input Characteristics

The MELP algorithm is only as good as the input that it is given. Anything that affects the voice prior to receipt by the voice coding algorithm will probably decrease its performance. Input characteristics are determined by several elements, and these are addressed below.

Microphones

Inexpensive microphones often provide poor frequency response. Available test results have shown that the MELP algorithm provides the highest performance with an input frequency response that is flat between 100 Hertz and 3800 Hertz.

Communication in environments with harsh acoustic noise requires microphones with noise cancellation. These microphones will alter the frequency response in the voiced regions. The altered frequency response is preferable, however, to the harsh background noise. The appendix lists some microphones used by the services and their frequency characteristics.

Talker Characteristics

Speaking correctly into the microphone is also important. Voice coding algorithms no longer require trained talkers, however, care should be taken to insure the voice is not distorted. This is true for any communications equipment, including the telephone system. The talker's mouth should be one to three inches from the microphone, and the talker should not speak too quickly. It is important to use noise-canceling microphones according to the manufacturer's instructions. Microphone orientation and distance to the user as well as the use of a windscreen will effect performance. The use of a microphone preamplifier that matches the characteristics of the microphone is also essential. The talker should enunciate the words well, especially if the listener is unfamiliar with the speaking habits of the talker.

Sampling and Filters

Voice should be sampled at a minimum of 8000 samples per second. Anti-aliasing filters should be used in the conversion from analog voice to a digital bitstream that represents that voice. These low pass filters should cut off at 4000 Hz and preserve the frequency and phase characteristics of the voice between 100 Hertz and 3800 Hertz.

The DDVPC testbed for the selection test used a Crystal Semiconductor A/D converter (part No. CS5336). It is a delta-sigma converter with 64X oversampling followed by digital filtering. The lowpass filter's frequency response is 3dB down at 3,670Hz and 18dB down at 4,000Hz. There is no highpass filter.

Dynamic Range

Adequate dynamic range can only be provided if a sufficient number of bits are used to quantize each sample. In addition, the quantization should use a linear method without companding. It is important to attempt to use the entire A/D range yet avoid A/D saturation. The selection test configuration used 16 bits per sample for the representation of the analog voice.

The minimum number of bits per sample required to provide the necessary dynamic range for MELP has not yet been determined.

Background Noise

As mentioned in section 3, the proper use of noise canceling microphones is required in harsh acoustic noise environments.

4. Output Characteristics

The output characteristics of the system are just as important as the input characteristics discussed in section 3. The high quality output of the MELP algorithm can be severely distorted unless correct measures are taken. Digital to analog conversion in the DDVPC testbed for the selection test used a Crystal Semiconductor D/A converter (part No. CS4328). It has an 8x oversampling filter followed by a 64x oversampled one-bit delta-sigma modulator; the filter's response is 3dB down at 3,920Hz and 12dB down at 4,000Hz. There is no highpass filter.

Earpiece and Speaker

Whether the equipment uses an earpiece or a speaker, the frequency response of the output should have a flat response between 100 Hertz and 3800 Hertz. Many times the receiving transducer is dictated by the microphone system (e.g., any telephone handset, or the earcups supplied with the M87 microphone headset). Care should be taken to insure adequate signal levels are sent and received by the transducers without developing perceptible levels of crosstalk between them.

5. Testing

Implementors often wish to test the voice performance of communications equipment. There are several tests available to measure voice performance. References for these tests are provided in the appendix. If the test material is not properly presented to and recorded from the equipment, however, the test results are not an accurate measurement of the performance of the equipment.

Optimal Testing Procedures

It is important to understand the test method and its requirements when generating material to be evaluated for any given method. The number of talkers and the amount of material per talker to be generated are impacted by the method selected, the conditions under which the coder is to be tested, and any direct coder-to-coder comparisons that are desired.

The military requires that vocoders function within many harsh acoustic noise environments. Platforms such as helicopters, tanks, and attack aircraft have ambient sound pressure levels in excess of 100dB. In addition the characteristics of these noise fields are often quite complex and vary greatly over time. Electronic generation of noise fields (e.g., a white noise source filtered to match an averaged frequency response curve) cannot take into account temporal characteristics. Electronic mixing of actual or simulated noise fields also falls short of an accurate representation of the environment because the level of effort required by a talker in that environment is not taken into account. Accurate testing of vocoders under these conditions require the input material be generated by recording talkers in the noise environment (actual or simulated within an acoustic isolation room).

The reliability and repeatability of test methods for evaluating vocoder performance often depend on the use of calibrated prerecorded input material (a notable exception being communicability testing, which requires live, two-way, dynamic interaction). A good library provides calibrated input material from a single group of talkers over a variety of conditions and microphones.

If prerecorded materials are to be used as analog input, they should be of the highest possible quality. Preferably digitally recorded and edited at full bandwidth (44.1 or 48kHz) directly to a digital medium (e.g., Digital Audio Tape(DAT), CD-ROM, or file). Presentation material should be of a consistent quality and signal strength.

The use of calibrated material allows the input level to be set so that the full range of the A/D is used without risking saturation. An unprocessed sine wave reference tone indicative of the signal strength of the processed material should be included with all output materials. This tone will be the primary information used for setting the presentation level to listening crews.

There should be no changes in playback or record settings during generation of test materials.

6. References

Schmidt-Nielsen, A., (1995) Intelligibility and acceptability testing for speech technology. In Bennett, R.W., Syrdal, A. K., and Greenspan, S.L. (Eds.), Behavioral Aspects of Speech Technology. CRC.

Schmidt-Nielsen, A. (1992). Intelligibility and Acceptability Testing for Speech Technology, (Report No. 9379). Washington, DC: Naval Research Laboratory.

Gatewood, Paul D, LaFollette, Philip A., "Host Laboratory Functions for the DoD 2400 bps Vocoder Selection Process", Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1996, Volume 2, pages 1157-1160.

7. Appendix

Frequently Asked Questions

Tapes recorded of LPC-10 were made by processing digitized speech with a computer. The digitized speech was recorded in an anechoic sound chamber using a dynamic microphone. The talkers were trained to enunciate properly and to speak slowly and clearly. Actual implementations may have degraded quality for several reasons: microphone performance (see section 3.), speaker performance (see section 4.), talker characteristics (see section 3.), processor speed, memory capacity, background noise, or channel errors.

Some older radios do not use digital signal processing (DSP) chips for processing the input signal. The TACTERM, for example, uses a bit-sliced microprocessor. If the processing capability was not sufficient to implement the algorithm, shortcuts were taken that decreased the quality of the algorithm. When algorithms are ported from one platform to another (e.g. floating point to fixed point), care may not have been taken to insure accuracy in the calculations. Similarly, if insufficient memory was available to implement the algorithm shortcuts caused quality degradations.

Tapes of LPC-10 speech often used inputs recorded in a quiet background. The radios, however, are used in noisy environments. Tapes also assume a perfect channel that may not be available when the radio is used. When bits are received in error, the parameters of the algorithm are incorrect, and the output speech is degraded.

Whether the radio uses LPC-10 or another algorithm, these factors will cause speech to sound worse than expected. The combined effects are progressively detrimental.

Although LPC-10 and MELP are both based on linear predictive analysis, the two algorithms are very different and they cannot talk with each other. A radio containing MELP should also have the LPC-10 algorithm to maintain interoperability with older equipment that does not yet have MELP software.

Microphone Information for Various Military Equipment

Microphone or Handset Type

Platform

Electrovoice M87

Flight Deck, Shipboard CIC, Helicopters, USAF MCE field shelters

Roanwell TA840 Handset

Shipboard secure voice systems

Electrovoice H250 Handset

Wheeled vehicles (jeeps), US Army HMMWVs

Electrovoice M-101

In-Mask fighter aircraft (F-15, etc.)

Roanwell M-138

Tracked vehicles (tanks, personnel carriers)

Electrovoice 985 boom microphone (military designation M-162)

P-3 Turbo Propellers

Roanwell 215-330-001

Large airframe console positions (E-3A)

Qualcomm microphone

Digital Cellular Communication

Table 1: Microphone Information

Additional Speech Testing References

In 1996 the U.S. Department of Defense Digital Voice Processing Consortium held a special session at the International Conference on Acoustics, Speech, and Signal Processing. Several papers were written and presented to discuss techniques and tests for measuring voice coding algorithm performance. These papers are listed below.

Other MELP References

Additional information on MELP, other government standard voice coding algorithms, and the U.S. Department of Defense Digital Voice Processing Consortium can be found on the internet at http://www.arcon.com/ddvpc.


 This website is sponsored by ARCON Corporation for the DDVPC and as such must conform to DoD policies concerning websites. Please read this disclaimer for more information.


Comments concerning design and content of these pages  should be sent to info@arcon.com. This page was last updated on 03-Jan-2002.