Published January 1, 2017
| Version v1
Journal article
Open
Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend
Creators
- 1. Mitsubishi Elect Res Lab, Cambridge, MA 02139 USA
- 2. SRI Int, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
Description
This paper gives an in-depth presentation of the multi-microphone speech recognition system we submitted to the 3rd CHiME speech separation and recognition challenge (CHiME-3) and its extension. The proposed system takes advantage of recurrent neural networks (RNNs) throughout the model from the front-end speech enhancement to the language modeling. Three different types of beamforming are used to combine multi-microphone signals to obtain a single higher-quality signal. The beamformed signal is further processed by a single-channel long short-term memory (LSTM) enhancement network, which is used to extract stacked mel-frequency cepstral coefficients (MFCC) features. In addition, the beamformed signal is processed by two proposed noise-robust feature extraction methods. All features are used for decoding in speech recognition systems with deep neural network (DNN) based acoustic models and large-scale RNN language models to achieve high recognition accuracy in noisy environments. Our training methodology includes multi-channel noisy data training and speaker adaptive training, whereas at test time model combination is used to improve generalization. Results on the CHiME-3 benchmark show that the full set of techniques substantially reduced the word error rate (WER). Combining hypotheses from different beamforming and robust-feature systems ultimately achieved 5.05% WER for the real-test data, an 84.7% reduction relative to the baseline of 32.99% WER and a 44.5% reduction from our official CHiME-3 challenge result of 9.1% WER. Furthermore, this final result is better than the best result (5.8% WER) reported in the CHiME-3 challenge. (C) 2017 Elsevier Ltd. All rights reserved.
Files
bib-d2296038-fabe-424c-b56a-fd40ea77ba23.txt
Files
(258 Bytes)
| Name | Size | Download all |
|---|---|---|
|
md5:8ec9504523daab523759820621156a72
|
258 Bytes | Preview Download |