An Evaluation Study of using Various SNR-Level Training Data in the Denoising Auto Encoder (DAE) Technique for Speech Enhancement
Speech enhancement (SE) that reduces the noise effect plays an important role in the current widespread audio
applications such as speech recognition, speech-based information retrieval and voice control. Among the various speech
enhancement techniques, denoising auto-encoder (DAE) employs the well-known deep learning process to learn the
transformation from noisy data to the respective clean noise-free counterpart, and it has been shown to be very effective in
reducing the noise component as well as introducing little speech distortion. In this paper, we primarily investigate the
influence of the training data with different signal-to-noise ratios (SNRs) for DAE in the corresponding SE capability.
The major finding from our evaluation experiment is that the DAE trained via high-SNR data provides significantly better
improvement in speech quality for the noisy testing data over a wide range of noise levels, when compared with the DAE
trained via either of multi-SNR data and matched-SNR data. This result somewhat disagrees with the common and instinctive
sense that the model created with multi-SNR training data behaves well on average for the testing data at an arbitrary noise
level, and the matched-condition model should give the optimal performance. However, we give the possible explanations
about the above finding, and explore some advantages of using simply high-SNR training data to prepare the DAE for speech
enhancement. These advantages include a smaller amount of training data being required, a simpler DAE structure with fewer
hidden layers and higher adaptability to other noisy situations.
Index Terms - speech enhancement; auto-encoder; speech denoising; noise reduction; deep neural network.