Speech Emotion Recognition Based on Feature Fusion

Qi Shen; Guanggen Chen; Lin Chang

doi:10.2991/msmee-17.2017.208

<Previous Article In Volume

Next Article In Volume>

Speech Emotion Recognition Based on Feature Fusion

Authors

Qi Shen, Guanggen Chen, Lin Chang

Corresponding Author

Qi Shen

Available Online May 2017.

DOI: 10.2991/msmee-17.2017.208 How to use a DOI?
Keywords: speech emotion recognition, convolution neural network, feature fusion.
Abstract: Speech emotion recognition is mainly based on the differences of characteristics between different emotions. The traditional recognition method is based on the manual extracted features, such as MFCC and LPCC, etc., and also achieved well. But it is unclear what kind of feature are able to reflect the characteristics of human emotion from speech. With Convolution Neural Network (CNN) shows strong ability in the field of image classification, attracting more researchers to apply CNN to the learning of the spectrogram feature. However, the study of speech emotion either according to the characteristics of the traditional manual extraction or completely dependent on spectrogram of speech. There is still no combination of traditional features and spectrogram feature. In this paper, we propose a fusion neural network model combining the characteristics of traditional with spectrogram features. This multimodal CNN is trained with two stages. First, two CNN models pre-trained are fine-tuning respectively on the corresponding labeled audio datasets. Second, the outputs of the two CNN models are connected to a fusion network of fully-connected layers. The fusion network is trained to obtain a joint feature representation for emotion recognition. From the recognition results of emotional speech database, the proposed algorithm has higher speech emotion recognition rate and robustness.
Copyright: © 2017, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2017 2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017)
Series: Advances in Engineering Research
Publication Date: May 2017
ISBN: 978-94-6252-346-3
ISSN: 2352-5401
DOI: 10.2991/msmee-17.2017.208 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Qi Shen
AU  - Guanggen Chen
AU  - Lin Chang
PY  - 2017/05
DA  - 2017/05
TI  - Speech Emotion Recognition Based on Feature Fusion
BT  - Proceedings of the 2017 2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017)
PB  - Atlantis Press
SP  - 1071
EP  - 1074
SN  - 2352-5401
UR  - https://doi.org/10.2991/msmee-17.2017.208
DO  - 10.2991/msmee-17.2017.208
ID  - Shen2017/05
ER  -

download .riscopy to clipboard