Paper Title
An Evaluation Study of using Various SNR-Level Training Data in the Denoising Auto Encoder (DAE) Technique for Speech Enhancement

Speech enhancement (SE) that reduces the noise effect plays an important role in the current widespread audio applications such as speech recognition, speech-based information retrieval and voice control. Among the various speech enhancement techniques, denoising auto-encoder (DAE) employs the well-known deep learning process to learn the transformation from noisy data to the respective clean noise-free counterpart, and it has been shown to be very effective in reducing the noise component as well as introducing little speech distortion. In this paper, we primarily investigate the influence of the training data with different signal-to-noise ratios (SNRs) for DAE in the corresponding SE capability. The major finding from our evaluation experiment is that the DAE trained via high-SNR data provides significantly better improvement in speech quality for the noisy testing data over a wide range of noise levels, when compared with the DAE trained via either of multi-SNR data and matched-SNR data. This result somewhat disagrees with the common and instinctive sense that the model created with multi-SNR training data behaves well on average for the testing data at an arbitrary noise level, and the matched-condition model should give the optimal performance. However, we give the possible explanations about the above finding, and explore some advantages of using simply high-SNR training data to prepare the DAE for speech enhancement. These advantages include a smaller amount of training data being required, a simpler DAE structure with fewer hidden layers and higher adaptability to other noisy situations. Index Terms - speech enhancement; auto-encoder; speech denoising; noise reduction; deep neural network.