Abstract
Text-to-speech (TTS) systems on computers aim to transform written text into a natural-sounding auditory format. They are specifically designed for the Gujarati language, allowing users to input text and receive corresponding spoken output. This technology is particularly promising in enhancing content accessibility for individuals with literacy challenges or visual impairments, aiding effective information perception. Despite advancements, TTS systems face difficulties in generating emotionally expressive speech resembling human communication. Researchers are actively working on incorporating emotions and sensations into TTS systems, underscoring the potential for further research to enhance their effectiveness. The TTS process involves two phases: i) Pre-processing and normalization of Gujarati text, and ii) Converting the processed Gujarati text into synthesized speech using digital signal processing. This study concentrates on the pre-processing and normalization aspects of Gujarati text-to-speech conversion.