Article | KEER2014. Proceedings of the 5th Kanesi Engineering and Emotion Research; International Conference; Linköping; Sweden; June 11-13 | An Approach for Emotion Recognition using Purely Segment-Level Acoustic Features
Göm menyn

Title:
An Approach for Emotion Recognition using Purely Segment-Level Acoustic Features
Author:
Hao Zhang: School of Engineering, The University of Tokyo, Japan Shin’ichi Warisawa: School of Engineering, The University of Tokyo, Japan Ichiro Yamada: School of Engineering, The University of Tokyo, Japan
Download:
Full text (pdf)
Year:
2014
Conference:
KEER2014. Proceedings of the 5th Kanesi Engineering and Emotion Research; International Conference; Linköping; Sweden; June 11-13
Issue:
100
Article no.:
004
Pages:
39-49
No. of pages:
11
Publication type:
Abstract and Fulltext
Published:
2014-06-11
ISBN:
978-91-7519-276-5
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

A purely segment-level approach is proposed in this paper that entirely abandons the utterance-level features. We focus on better extracting the emotional information from a number of selected segments within utterances. We designed two segment selection approaches (miSATIR and crSATIR) for selecting utterance segments for use in extracting features that are based on information theory and correlation coefficients to create the purely segment-level concept of the model. We established a model using these selected segment-level speech frames after clarifying the time interval for the segments. Testing has been carried out on a 50-person emotional speech database that was specifically designed for this research; and we found that there were significant improvements in the average level of accuracy (more than 20%) compared to that using the existing approaches for all the utterances’ information. The test results that were based on the speech signals stimulated by the International Affective Picture System (IAPS) database showed that the proposed method could be used in emotion strength analyses.

Keywords: A purely segment-level approach is proposed in this paper that entirely abandons the utterance-level features. We focus on better extracting the emotional information from a number of selected segments within utterances. We designed two segment selection

KEER2014. Proceedings of the 5th Kanesi Engineering and Emotion Research; International Conference; Linköping; Sweden; June 11-13

Author:
Hao Zhang, Shin’ichi Warisawa, Ichiro Yamada
Title:
An Approach for Emotion Recognition using Purely Segment-Level Acoustic Features
References:

Atal; B. S.; & Hanauer; S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America; 50; 637.

Battiti; R. (1994). Using mutual information for selecting features in supervised neural net learning. Neural Networks; IEEE Transactions on; 5(4); 537-550.

Chandaka; S.; Chatterjee; A.; & Munshi; S. (2009). Support vector machines employing cross-correlation for emotional speech recognition. Measurement; 42(4); 611-618.

Davis; S.; & Mermelstein; P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics; Speech and Signal Processing; IEEE Transactions on; 28

echanics. Physical review; 106(4); 620.

Kim; E. H.; Hyun; K. H.; Kim; S. H.; & Kwak; Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. Mechatronics; IEEE/ASME Transactions on; 14(3); 317-325.

Lang; P. J.; Bradley; M. M.; & Cuthbert; B. N. (1999). International affective picture system (IAPS): Technical manual and affective ratings: Gainesville; FL: The Center for Research in Psychophysiology; University of Florida.

Morrison; D.; Wang; R.; & De Silva; L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech communication; 49(2); 98-112.

Pearson; K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London; 58(347-352); 240-242.

Picard; R. W. (2000). Affective computing: MIT press.

Qi-Rong; M.; & Zhan; Y.-z. (2010). A novel hierarchical speech emotion recognition method based on improved DDAGSVM. Computer Science and Information Systems/ComSIS; 7(1); 211-222.

Schuller; B.; & Rigoll; G. (2006). Timing levels in segment-based speech emotion recognition. Paper presented at the INTERSPEECH; Pittsburgh; Pennsylvania; USA.

Shannon; C. E. (2001). A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review; 5(1); 3-55.

Specht; D. F. (1990). Probabilistic neural networks. Neural networks; 3(1); 109-118.

Steuer; R.; Kurths; J.; Daub; C. O.; Weise; J.; & Selbig; J. (2002). The mutual information: detecting and evaluating dependencies between variables. Bioinformatics; 18(suppl 2); S231-S240.

Ververidis; D.; & Kotropoulos; C. (2006). Emotional speech recognition: Resources; features; and methods. Speech communication; 48(9); 1162-1181.

Yeh; J.-H.; Pao; T.-L.; Lin; C.-Y.; Tsai; Y.-W.; & Chen; Y.-T. (2011). Segment-based emotion recognition from continuous Mandarin Chinese speech. Computers in Human Behavior; 27(5); 1545-1552.

Yu; F. B. J. Y. Y.; & Xu; D. (2007). Decision Templates Ensemble and Diversity Analysis for Segment-Based Speech Emotion Recognition. Paper presented at the ISKE 2007; San Diego; CA; USA.

KEER2014. Proceedings of the 5th Kanesi Engineering and Emotion Research; International Conference; Linköping; Sweden; June 11-13

Author:
Hao Zhang, Shin’ichi Warisawa, Ichiro Yamada
Title:
An Approach for Emotion Recognition using Purely Segment-Level Acoustic Features
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21