A Multi Model HMM Based Speech Synthesis

Authors

  • Supadaech Chanjaradwichai Chulalongkorn University
  • Atiwong Suchato Chulalongkorn University
  • Proadpran Punyabukkana Chulalongkorn University

DOI:

https://doi.org/10.4186/ej.2018.22.1.187

Abstract

The Multi-Space Probability Distribution Hidden Markov model (MSD-HMM) is a discrete model that learns a fundamental frequency feature, however it has been proven that synthesized speeches from that model contain buzziness and hoarseness which affect to an intelligibility of synthesized speeches. This research aims to improve an intelligibility of synthesized speeches by proposing a multi model HMM based speech synthesis which it models spectral features and fundamental frequency features separately called spectral model and fundamental frequency model instead of combining them to a same model. The fundamental frequency model is modelled by MSD-HMM. Output durations are calculated from maximum probability of both models. A voicing condition restriction rule with minimum output duration criteria are proposed to prevent an unmatched voicing condition of the generated parameter. Objective results show that the proposed multi model is comparable to the shared model while subjective results show that the proposed model with voicing condition restriction rule and without voicing condition restriction rule is outperform the shared model and reduce the buzziness and hoarseness of the synthesized voice. Intelligibility MOS scores of the proposed model with a voicing condition restriction, the proposed model without a voicing condition restriction and the share model are 3.62, 3.69 and 3.08 respectively and naturalness MOS scores are 3.71, 3.71 and 3.14 respectively.

Downloads

Download data is not yet available.

Author Biographies

Supadaech Chanjaradwichai

Spoken Language Systems Research Group, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand

Atiwong Suchato

Spoken Language Systems Research Group, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand

Proadpran Punyabukkana

Spoken Language Systems Research Group, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand

Published

Vol 22 No 1, Jan 31, 2018

How to Cite

[1]
S. Chanjaradwichai, A. Suchato, and P. Punyabukkana, “A Multi Model HMM Based Speech Synthesis”, Eng. J., vol. 22, no. 1, pp. 187-203, Jan. 2018.