Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis

Eva Vanmassenhove, João P. Cabral, Fasih Haider

Research output: Contribution to conferencePaperOther research output

7 Citations (Scopus)


The generation of expressive speech is a great challenge for text-to-speech synthesis in audiobooks. One of the most important factors is the variation in speech emotion or voice style. In this work, we developed a method to predict the emotion from a sentence so that we can convey it through the synthetic voice. It consists of combining a standard emotion-lexicon based technique with the polarity-scores (positive/negative polarity) provided by a less fine-grained sentiment analysis tool, in order to get more accurate emotion-labels. The primary goal of this emotion prediction tool was to select the type of voice (one of the emotions or neutral) given the input sentence to a stateof- the-art HMM-based Text-to-Speech (TTS) system. In addition, we also combined the emotion prediction from text with a speech clustering method to select the utterances with emotion during the process of building the emotional corpus for the speech synthesizer. Speech clustering is a popular approach to divide the speech data into subsets associated with different voice styles. The challenge here is to determine the clusters that map out the basic emotions from an audiobook corpus that contains high variety of speaking styles, in a way that minimizes the need for human annotation. The evaluation of emotion classification from text showed that, in general, our system can obtain accuracy results close to that of human annotators. Results also indicate that this technique is useful in the selection of utterances with emotion for building expressive synthetic voices.

Original languageEnglish
Number of pages6
Publication statusPublished - 2016
Event9th ISCA Speech Synthesis Workshop, SSW 2016 - Sunnyvale, United States
Duration: 13 Sept 201615 Sept 2016


Conference9th ISCA Speech Synthesis Workshop, SSW 2016
Country/TerritoryUnited States


  • audiobooks
  • emotion
  • expressive speech synthesis
  • sentiment analysis
  • speech clustering


Dive into the research topics of 'Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis'. Together they form a unique fingerprint.

Cite this