Concatenation point optimization by principal component analysis
Main Article Content
Abstract
Generally, concatenative speech synthesis systems provide a considerable synthesis quality since the criteria for unit selection methods have been optimized. However, the level of synthesis quality depends on the adequate position of the concatenation points of all acoustic units that have to be concatenated. The position of the concatenation points heavily determines the grade of mismatch and distortion human perception in a synthesized waveform. Therefore, we present a concatenation point optimization (CPO) algorithm based on Principal Component Analysis (PCA) that establishes an optimal concatenation point between any two matching acoustic units in a given inventory and reduces the distort human perception in Text-To-Speech Synthesis (TTS) Systems. The algorithm extracts data frames referring to a concatenation point and transforms them, using PCA, into a particularly framework, preserving the relevant properties of the waveform. Afterwards, we determined the optimal concatenation point by a task optimization. Experimental evaluations characterize the behavior of the proposed concatenation point optimization method and emphasizes its viability.