A Multi Model HMM Based Speech Synthesis
DOI:
https://doi.org/10.4186/ej.2018.22.1.187Abstract
The Multi-Space Probability Distribution Hidden Markov model (MSD-HMM) is a discrete model that learns a fundamental frequency feature, however it has been proven that synthesized speeches from that model contain buzziness and hoarseness which affect to an intelligibility of synthesized speeches. This research aims to improve an intelligibility of synthesized speeches by proposing a multi model HMM based speech synthesis which it models spectral features and fundamental frequency features separately called spectral model and fundamental frequency model instead of combining them to a same model. The fundamental frequency model is modelled by MSD-HMM. Output durations are calculated from maximum probability of both models. A voicing condition restriction rule with minimum output duration criteria are proposed to prevent an unmatched voicing condition of the generated parameter. Objective results show that the proposed multi model is comparable to the shared model while subjective results show that the proposed model with voicing condition restriction rule and without voicing condition restriction rule is outperform the shared model and reduce the buzziness and hoarseness of the synthesized voice. Intelligibility MOS scores of the proposed model with a voicing condition restriction, the proposed model without a voicing condition restriction and the share model are 3.62, 3.69 and 3.08 respectively and naturalness MOS scores are 3.71, 3.71 and 3.14 respectively.
Downloads
Downloads
Authors who publish with Engineering Journal agree to transfer all copyright rights in and to the above work to the Engineering Journal (EJ)'s Editorial Board so that EJ's Editorial Board shall have the right to publish the work for nonprofit use in any media or form. In return, authors retain: (1) all proprietary rights other than copyright; (2) re-use of all or part of the above paper in their other work; (3) right to reproduce or authorize others to reproduce the above paper for authors' personal use or for company use if the source and EJ's copyright notice is indicated, and if the reproduction is not made for the purpose of sale.