The previous article focused on the technology known as STT. We analysed the effectiveness and accuracy of this technology along with how different factors impact the way it works. In this part, I and Szymon Rożdzyński want to concentrate on the opposite technology that allows intelligent algorithms to convert written text to speech.

Put briefly, TTS is a technology that takes written text as input and generates a corresponding spoken output. TTS is commonly used in applications like voice assistants, accessibility features for visually impaired individuals, audiobooks, and various other scenarios where a natural-sounding voice is needed for communicating textual information.

Let’s have a look at how this AI voicebot technology works in the Polish banking sector.

Text-to-Speech test experiment

In evaluating the efficacy of text-to-speech (TTS) services for the banking industry, we prepared a dataset of pre-written texts. These texts were processed through TTS engines to generate audio outputs, which were then compared against the expected standard of natural human speech. This TTS experiment aimed to critically assess the ability of current technologies to produce speech that is not only comprehensible but also engaging and natural-sounding to the user, which is particularly important for automated systems used in customer-facing industries like banking.

DATASETS

• 11 short sentences: less than 20 tokens. These sentences were designed to represent a variety of customer inquiries and commands commonly encountered in the banking sector.
• 2 long conversations: spanning over 200 tokens. These longer texts aimed to test the endurance and consistency of the TTS services in simulating a natural conversation flow.

About the author

Kamil Machalica

Kamil Machalica

Senior AI Data Scientist