Auditory feature for monaural speech segregation
Yi Jiang, RunshenG Liu, Yuanyuan Zu
Available Online March 2014.
- https://doi.org/10.2991/icieac-14.2014.16How to use a DOI?
- gammatone frequency cepstral coefficients (GFCC); monaural speech segregation; binary classification; time-frequency(T-F) unit
- Monaural speech segregation has been a very challenging problem for speech signal processing. The implication of the ideal binary masks to an auditory mixture has been shown to yield substantial improvements in signal-to-noise-ratio (SNR) and intelligibility. In this paper, we use the time-frequency (T-F) unit level gammatone frequency cepstral coefficients (GFCC) auditory feature to estimate the ideal binary mask for monaural speech segregation. The paper reports the successful attempt to use GFCC as the segregation cue with deep neural networks (DNNs) classifier. Results show that robust performance can be achieved across noisy and reverberant conditions.
- Open Access
- This is an open access article distributed under the CC BY-NC license.
Cite this article
TY - CONF AU - Yi Jiang AU - RunshenG Liu AU - Yuanyuan Zu PY - 2014/03 DA - 2014/03 TI - Auditory feature for monaural speech segregation BT - Proceedings of the 2nd International Conference on Information, Electronics and Computer PB - Atlantis Press SP - 69 EP - 72 SN - 1951-6851 UR - https://doi.org/10.2991/icieac-14.2014.16 DO - https://doi.org/10.2991/icieac-14.2014.16 ID - Jiang2014/03 ER -