Skip to content

Google Text-to-Speech (GoogleTTS)

Description

The GoogleTTS node uses the Google Cloud Text-to-Speech API to generate realistic spoken audio from text or SSML input.
It supports multiple languages, voices, and audio formats (MP3, OGG, WAV/LINEAR16), allowing you to convert workflow-generated text into spoken output for use in video narration, alerts, audio messages, or AI-powered assistants.

This node connects to Google Cloud using a stored ThirdParty-Google Cloud credential and outputs an audio file that can be played, downloaded, or embedded in later steps.


How It Works

  1. Validates required fields such as Input Type, Voice, and Audio Format.
  2. Retrieves an OAuth access token from the configured Google Cloud third-party service.
  3. Breaks long text or SSML into chunks (default 4500 characters per chunk).
  4. Sends each chunk to Google’s Text-to-Speech API for synthesis.
  5. Combines all audio chunks into a single file (MP3, OGG, or WAV).
  6. Outputs the generated audio file path and metadata.

Input Fields

FieldTypeDescriptionRequired
ThirdParty - Google CloudThird-Party TokenSelect or provide the Google Cloud third-party credential (OAuth token).
Input TypePicklistDetermines whether input is Text or SSML.
TextText (multi-line)Plain text input to synthesize. Used if Input Type = Text.✅ if Text mode
SSMLText (multi-line)SSML-formatted input for advanced control of pauses, tone, etc.✅ if SSML mode
VoiceTextThe full voice name (e.g., en-US-Neural2-D, en-GB-News-L).
Audio FormatPicklistOutput format: MP3, OGG_OPUS, or LINEAR16 (WAV).
Speaking RateNumberAdjusts the voice speed. Default = 1.0 (normal).
Pitch (st)NumberPitch adjustment in semitones. Default = 0.0.
Volume Gain (dB)NumberVolume adjustment in decibels. Default = 0.0.
Sample Rate (Hz)NumberOptional override of the sample rate (e.g., 24000, 48000). Required for LINEAR16.
Chunk Size (chars)NumberMax characters per TTS request. Default = 4500.
File NameTextBase name for the output audio file (without extension).

Output Variables

VariableTypeDescription
Audio.FileStringFull path of the generated audio file.
Audio.VoiceStringThe Google TTS voice used.
Audio.FormatStringThe format of the generated audio file (mp3, ogg, or wav).
Audio.ChunksNumberNumber of chunks the input text was divided into.
Audio.CharactersNumberTotal number of characters processed.
taskMessageStringCompletion message.
statusReturnStringCompleted if successful, or Fail if an error occurred.

Example Output

json
{
  "Audio": {
    "File": "C:\\MinuteView\\Working\\narration.mp3",
    "Voice": "en-US-Neural2-D",
    "Format": "MP3",
    "Chunks": 2,
    "Characters": 8400
  },
  "taskMessage": "Google TTS synthesis completed successfully",
  "statusReturn": "Completed"
}

Example Configuration

SettingExample
ThirdParty - Google CloudGoogleCloud-ProdToken
Input TypeText
TextWelcome to MinuteView Automations, your engineering workflow companion.
Voiceen-US-Neural2-D
Audio FormatMP3
Speaking Rate1.1
Pitch (st)0.5
File Namewelcome_message

Result: → Generates welcome_message.mp3 in the working folder with a natural American English voice.


Example (Advanced SSML Mode)

SettingExample
Input TypeSSML
SSML
xml
<speak>
  Hello there! <break time="500ms"/> 
  <emphasis level="moderate">Welcome to MinuteView Automations.</emphasis>
</speak>
``` |
| **Voice** | `en-GB-Neural2-A` |
| **Audio Format** | `LINEAR16` |
| **Sample Rate (Hz)** | `48000` |
| **File Name** | `intro_voice` |

**Result:**  
→ Produces `intro_voice.wav` with SSML-controlled timing and emphasis.

---

## Notes

- The node automatically splits long text into chunks (max 4500 characters).  
- All chunks are concatenated into a single output file.  
- If **Audio Format** = `LINEAR16`, the node creates a valid `.wav` file with a PCM header.  
- Language code is inferred automatically from the **Voice** name (e.g., `en-US-Neural2-D` → `en-US`).  
- Compatible with any voice available in Google Cloud TTS.  
- Works for all text languages supported by the selected voice.  
- Requires a valid **Google Cloud third-party token** (OAuth-based).

---

## Common Use Cases

| Scenario | Description |
|-----------|--------------|
| 🔊 **AI Narration** | Convert dynamically generated text to audio for training, documentation, or presentations. |
| 💬 **Chatbot Voice Output** | Generate speech responses for AI chat or assistant workflows. |
| 🎧 **Audio Alerts** | Play or send system notifications with voice messages. |
| 🗣️ **Multilingual Output** | Generate speech in any supported language and accent. |

---

## Status Messages

| Status | Description |
|---------|-------------|
| **Completed** | Audio synthesis completed successfully. |
| **Fail** | Error occurred (invalid credentials, empty input, or API failure). |

---

## Error Handling

The node logs detailed workflow messages in case of failure:
- Missing or invalid Google Cloud token  
- Empty text or SSML input  
- Invalid voice or audio format  
- API error or connection issue  
- Output file write error  

Check the **Workflow Log** for `[ERROR] GoogleTTS failed:` entries to diagnose issues.

---

## Example Workflow Integration

```mermaid
graph LR
    A[AI Generate Response] --> B[Google TTS]
    B --> C[Save File to SharePoint]
    B --> D[Play Audio Notification]

Category: AI & Google Cloud Task Name: GoogleTTS

Tentech 2024