Advancements in artificial intelligence (AI)-driven text-to-speech (TTS) technology have enabled transformative applications in accessibility and personalized learning. This study focuses on the development of a customizable Turkish TTS system aimed at enhancing lecture notes through personalized narration. Despite significant progress in widely spoken languages, the study addresses the lack of open-source Turkish TTS solutions by leveraging Narakeet TTS and fine-tuning the RVCv2 model. The methodology involves processing user-recorded voice data, segmenting audio, and training a voice conversion model to align synthesized TTS output with the speaker’s unique vocal characteristics. Metrics such as Mel-Frequency Cepstral Coefficients (MFCC), Dynamic Time Warping (DTW), Mel Cepstral Distortion (MCD), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI) were employed to evaluate system performance. In order to determine the similarity of the produced voice to the user’s voice, the first sound and the last sound were compared with the Emphasized Channel Attention, Propagation, and Aggregation Time Delay Neural Network (ECAPA-TDNN). Results demonstrate that the personalized model achieves high intelligibility and similarity to the original voice, with MCD values 0, PESQ scores 4.5, STOI scores 1, and ECAPA-TDNN similarity 0.62. This study underscores the potential of open-source TTS solutions in supporting less-resourced languages and highlights the importance of personalization in educational AI tools.
Cite this article as: F. Akar, "AI meets your voice: transforming Turkish text into personalized speech," Electrica, 25, 0034, 2025. doi: 10.5152/electrica.2025.25034.