A Multi Model HMM Based Speech Synthesis

  • Supadaech Chanjaradwichai Chulalongkorn University
  • Atiwong Suchato Chulalongkorn University
  • Proadpran Punyabukkana Chulalongkorn University

Downloads

Download data is not yet available.

Abstract

The Multi-Space Probability Distribution Hidden Markov model (MSD-HMM) is a discrete model that learns a fundamental frequency feature, however it has been proven that synthesized speeches from that model contain buzziness and hoarseness which affect to an intelligibility of synthesized speeches. This research aims to improve an intelligibility of synthesized speeches by proposing a multi model HMM based speech synthesis which it models spectral features and fundamental frequency features separately called spectral model and fundamental frequency model instead of combining them to a same model. The fundamental frequency model is modelled by MSD-HMM. Output durations are calculated from maximum probability of both models. A voicing condition restriction rule with minimum output duration criteria are proposed to prevent an unmatched voicing condition of the generated parameter. Objective results show that the proposed multi model is comparable to the shared model while subjective results show that the proposed model with voicing condition restriction rule and without voicing condition restriction rule is outperform the shared model and reduce the buzziness and hoarseness of the synthesized voice. Intelligibility MOS scores of the proposed model with a voicing condition restriction, the proposed model without a voicing condition restriction and the share model are 3.62, 3.69 and 3.08 respectively and naturalness MOS scores are 3.71, 3.71 and 3.14 respectively.

View article in other formats
Author Biographies
Supadaech Chanjaradwichai

Spoken Language Systems Research Group, Department of Computer Engineering,
Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand

Atiwong Suchato

Spoken Language Systems Research Group, Department of Computer Engineering,
Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand

Proadpran Punyabukkana

Spoken Language Systems Research Group, Department of Computer Engineering,
Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand

Published
Vol 22 No 1, Jan 31, 2018
How to Cite
S. Chanjaradwichai, A. Suchato, and P. Punyabukkana, “A Multi Model HMM Based Speech Synthesis”, Engineering Journal, vol. 22, no. 1, pp. 187-203, Jan. 2018.

Authors who publish with Engineering Journal agree to transfer all copyright rights in and to the above work to the Engineering Journal (EJ)'s Editorial Board so that EJ's Editorial Board shall have the right to publish the work for nonprofit use in any media or form. In return, authors retain: (1) all proprietary rights other than copyright; (2) re-use of all or part of the above paper in their other work; (3) right to reproduce or authorize others to reproduce the above paper for authors' personal use or for company use if the source and EJ's copyright notice is indicated, and if the reproduction is not made for the purpose of sale.

Article Statistics
Total PDF downloads: 13