Skip to content

Google Text-to-Speech ​

Convert text or SSML markup into a spoken audio file using the Google Cloud Text-to-Speech service.

Purpose ​

Use this task when a workflow needs to produce audio output from dynamically generated or static text. It supports multiple voices, languages, and audio formats, making it suitable for narration, voice alerts, or audio message generation. The task splits long inputs into manageable chunks and combines them into a single output file.

Inputs ​

FieldTypeRequiredDescription
ThirdParty - Google CloudTextYesThe name of the configured Google Cloud third-party credential to authenticate with the Text-to-Speech API.
Input TypeDropdownYesWhether the input is plain Text or SSML markup.
TextMulti-line TextYesThe plain text content to synthesise into speech.
SSMLMulti-line TextYesThe SSML-formatted input for advanced speech control such as pauses and emphasis.
VoiceTextYesThe full Google Cloud voice name to use, for example en-US-Neural2-D.
Audio FormatDropdownYesThe output audio format: MP3, OGG_OPUS, or LINEAR16 (WAV).
Speaking RateTextNoPlayback speed multiplier. Defaults to 1.0 (normal speed).
Pitch (st)TextNoPitch adjustment in semitones. Defaults to 0.0.
Volume Gain (dB)TextNoVolume adjustment in decibels. Defaults to 0.0.
Sample Rate (Hz)TextNoSample rate override in hertz, for example 24000 or 48000. Required when Audio Format is LINEAR16.
Chunk Size (chars)TextNoMaximum characters per synthesis request. Defaults to 4500.
File NameTextYesBase name for the output audio file, without extension.

Visibility Rules ​

Text is only shown when Input Type is set to Text.

SSML is only shown when Input Type is set to SSML.

Outputs ​

NameDescription
AudioAn object containing the full output file path (Audio.File), the voice used (Audio.Voice), the audio format (Audio.Format), the number of chunks processed (Audio.Chunks), and the total character count (Audio.Characters).

Tentech