A Multi Model HMM Based Speech Synthesis

Supadaech Chanjaradwichai; Atiwong Suchato; Proadpran Punyabukkana

doi:10.4186/ej.2018.22.1.187

Authors

Supadaech Chanjaradwichai Chulalongkorn University
Atiwong Suchato Chulalongkorn University
Proadpran Punyabukkana Chulalongkorn University

DOI:

https://doi.org/10.4186/ej.2018.22.1.187

Abstract

The Multi-Space Probability Distribution Hidden Markov model (MSD-HMM) is a discrete model that learns a fundamental frequency feature, however it has been proven that synthesized speeches from that model contain buzziness and hoarseness which affect to an intelligibility of synthesized speeches. This research aims to improve an intelligibility of synthesized speeches by proposing a multi model HMM based speech synthesis which it models spectral features and fundamental frequency features separately called spectral model and fundamental frequency model instead of combining them to a same model. The fundamental frequency model is modelled by MSD-HMM. Output durations are calculated from maximum probability of both models. A voicing condition restriction rule with minimum output duration criteria are proposed to prevent an unmatched voicing condition of the generated parameter. Objective results show that the proposed multi model is comparable to the shared model while subjective results show that the proposed model with voicing condition restriction rule and without voicing condition restriction rule is outperform the shared model and reduce the buzziness and hoarseness of the synthesized voice. Intelligibility MOS scores of the proposed model with a voicing condition restriction, the proposed model without a voicing condition restriction and the share model are 3.62, 3.69 and 3.08 respectively and naturalness MOS scores are 3.71, 3.71 and 3.14 respectively.