Research On Prosody Conversion of Affective Speech Based on LIBSVM and PAD Three Dimensional Emotion Model
- 10.2991/wartia-16.2016.1How to use a DOI?
- PAD emotion model, five-scale tone model, Library for Support Vector Machines LIBSVM support vector regression, generalized regression neural network, Prosody Conversion.
This paper proposes a framework for prosody conversion of emotional speech based on LIBSVM support vector regression model and PAD three dimensional emotion model. We design an emotional speech corpus including 11 kinds of emotional utterances. Each utterance is labeled the emotional information with PAD value. A five-scale tone model is employed to model the pitch contour of emotional speech at the syllable level. A LIBSVM SVR-based prosody conversion model is proposed to realize the transformation of pitch contour, duration and pause duration of emotional speech according to the PAD values of emotion and context information of text. Speech is then re-synthesized with the STRAIGHT algorithm by modifying pitch contour, duration and pause duration, and is compared with the results obtained by the generalized regression neural network. Experimental results show that the modified speech achieves 3.8 of average Emotional Mean Opining Score (EMOS).
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Xiaoyong Lu AU - Tao Pan PY - 2016/05 DA - 2016/05 TI - Research On Prosody Conversion of Affective Speech Based on LIBSVM and PAD Three Dimensional Emotion Model BT - Proceedings of the 2016 2nd Workshop on Advanced Research and Technology in Industry Applications PB - Atlantis Press SP - 1 EP - 7 SN - 2352-5401 UR - https://doi.org/10.2991/wartia-16.2016.1 DO - 10.2991/wartia-16.2016.1 ID - Lu2016/05 ER -