A hierarchical method of automatic speech segmentation for synthesis applications

S Pauws*, Y Kamp, L Willems

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

13 Citations (Scopus)


The paper describes a method for automatically segmenting a database of isolated words as required for the purpose of speech synthesis. The phoneme-like units in the phonetic transcription of the utterances are represented by dedicated hidden Markov models (HMMs) and segmentation is performed by aligning the speech signal against the sequence of HMMs representing the words. The specific advantage of the method presented here is that it does not need manually segmented speech material to initialize the training of the HMMs. Therefore, it can be regarded as an improved variant of established techniques for automatic segmentation. The problem of proper initialization of the HMMs without resorting to manually segmented material is solved by a hierarchical approach consisting of three successive steps. In the first step a segmentation in broad phonetic classes is realized that provides anchor points for the second stage, consisting of a sequence-constrained vector quantization. In this stage each broad phonetic class is further segmented into its constituent phonemes. The result is a crude phonetic segmentation which is then used as initialization of the HMMs in the last stage. Fine-tuning of the models is realized via Baum-Welch estimation. The final segmentation is obtained by Viterbi alignment of the utterances against the HMMs. This hierarchical approach was used to segment a database of isolated words recorded from a male speaker. An accuracy of 89.51% was obtained in the location of the phoneme boundaries with a tolerance of 20 ms.

Original languageEnglish
Pages (from-to)207-220
Number of pages14
JournalSpeech Communication
Issue number3
Publication statusPublished - Sept 1996
Externally publishedYes


  • speech segmentation
  • hidden Markov models (HMM)
  • vector quantization


Dive into the research topics of 'A hierarchical method of automatic speech segmentation for synthesis applications'. Together they form a unique fingerprint.

Cite this