Wear OS Gets New, More Efficient Text-to-Speech Engine

Uncategorized

Wear OS Gets New, More Efficient Text-to-Speech Engine

MMS • Sergio De Simone

Google has announced a new text-to-speech engine for Wear OS, its Android variant aimed at smartwatches and other wearables, supporting over 50 languages and faster than its predecessor thanks to using smaller ML models.

According to Google’s Ouiam Koubaa and Yingzhe Li, the new text-to-speech engine is particularly geared for low-memory devices, such as those that power services for which wearables are most suited, including accessibility services, exercise apps, navigation-cues, and reading-aloud apps.

Text-to-speech turns text into natural-sounding speech across more than 50 languages powered by Google’s machine learning (ML) technology. The new text-to-speech engine on Wear OS uses smaller and more efficient prosody ML models to bring faster synthesis on Wear OS devices.

The new text-to-speech engine does not introduce new APIs to synthesize speech, meaning developers can keep using the previously existing speak method, along with the rest of the methods previously available.

Developers should keep in mind that the new engine takes about 10 seconds to get ready when the app is initialized. Therefore, apps that want to use speech right after launch is completed should initialize the engine as soon as possible by calling TextToSpeech(applicationContext, callback) and synthesize the desired text from the passed callback.

An additional caveat concerns the possibility that the new engine can synthesize speech in a language other than the user’s preferred language. This may happen, for example, when sending an emergency call, in which case the language corresponding to the actual locale the user is in is preferred over the user’s chosen UI language.

The new text-to-speech engine can be used on devices running Wear OS 4, released last July, or higher.

Besides text-to-speech synthesis, Wear OS also provides a speech recognition service through the SpeechRecognizer API, which is though not appropriate for continuous recognition since it relies on remote services.

About the Author

Sergio De Simone

Show moreShow less

Uncategorized