Google Text-to-Speech (GoogleTTS)

Description

The GoogleTTS node uses the Google Cloud Text-to-Speech API to generate realistic spoken audio from text or SSML input.
It supports multiple languages, voices, and audio formats (MP3, OGG, WAV/LINEAR16), allowing you to convert workflow-generated text into spoken output for use in video narration, alerts, audio messages, or AI-powered assistants.

This node connects to Google Cloud using a stored ThirdParty-Google Cloud credential and outputs an audio file that can be played, downloaded, or embedded in later steps.

How It Works

Validates required fields such as Input Type, Voice, and Audio Format.
Retrieves an OAuth access token from the configured Google Cloud third-party service.
Breaks long text or SSML into chunks (default 4500 characters per chunk).
Sends each chunk to Google’s Text-to-Speech API for synthesis.
Combines all audio chunks into a single file (MP3, OGG, or WAV).
Outputs the generated audio file path and metadata.

Input Fields

Field	Type	Description	Required
ThirdParty - Google Cloud	Third-Party Token	Select or provide the Google Cloud third-party credential (OAuth token).	✅
Input Type	Picklist	Determines whether input is Text or SSML.	✅
Text	Text (multi-line)	Plain text input to synthesize. Used if Input Type = `Text`.	✅ if Text mode
SSML	Text (multi-line)	SSML-formatted input for advanced control of pauses, tone, etc.	✅ if SSML mode
Voice	Text	The full voice name (e.g., `en-US-Neural2-D`, `en-GB-News-L`).	✅
Audio Format	Picklist	Output format: `MP3`, `OGG_OPUS`, or `LINEAR16` (WAV).	✅
Speaking Rate	Number	Adjusts the voice speed. Default = `1.0` (normal).	❌
Pitch (st)	Number	Pitch adjustment in semitones. Default = `0.0`.	❌
Volume Gain (dB)	Number	Volume adjustment in decibels. Default = `0.0`.	❌
Sample Rate (Hz)	Number	Optional override of the sample rate (e.g., `24000`, `48000`). Required for LINEAR16.	❌
Chunk Size (chars)	Number	Max characters per TTS request. Default = `4500`.	❌
File Name	Text	Base name for the output audio file (without extension).	✅

Output Variables

Variable	Type	Description
Audio.File	String	Full path of the generated audio file.
Audio.Voice	String	The Google TTS voice used.
Audio.Format	String	The format of the generated audio file (`mp3`, `ogg`, or `wav`).
Audio.Chunks	Number	Number of chunks the input text was divided into.
Audio.Characters	Number	Total number of characters processed.
taskMessage	String	Completion message.
statusReturn	String	`Completed` if successful, or `Fail` if an error occurred.

Example Output

json

{
  "Audio": {
    "File": "C:\\MinuteView\\Working\\narration.mp3",
    "Voice": "en-US-Neural2-D",
    "Format": "MP3",
    "Chunks": 2,
    "Characters": 8400
  },
  "taskMessage": "Google TTS synthesis completed successfully",
  "statusReturn": "Completed"
}

Example Configuration

Setting	Example
ThirdParty - Google Cloud	`GoogleCloud-ProdToken`
Input Type	`Text`
Text	`Welcome to MinuteView Automations, your engineering workflow companion.`
Voice	`en-US-Neural2-D`
Audio Format	`MP3`
Speaking Rate	`1.1`
Pitch (st)	`0.5`
File Name	`welcome_message`

Result: → Generates welcome_message.mp3 in the working folder with a natural American English voice.

Example (Advanced SSML Mode)

Setting	Example
Input Type	`SSML`
SSML

xml

<speak>
  Hello there! <break time="500ms"/> 
  <emphasis level="moderate">Welcome to MinuteView Automations.</emphasis>
</speak>
``` |
| **Voice** | `en-GB-Neural2-A` |
| **Audio Format** | `LINEAR16` |
| **Sample Rate (Hz)** | `48000` |
| **File Name** | `intro_voice` |

**Result:**  
→ Produces `intro_voice.wav` with SSML-controlled timing and emphasis.

---

## Notes

- The node automatically splits long text into chunks (max 4500 characters).  
- All chunks are concatenated into a single output file.  
- If **Audio Format** = `LINEAR16`, the node creates a valid `.wav` file with a PCM header.  
- Language code is inferred automatically from the **Voice** name (e.g., `en-US-Neural2-D` → `en-US`).  
- Compatible with any voice available in Google Cloud TTS.  
- Works for all text languages supported by the selected voice.  
- Requires a valid **Google Cloud third-party token** (OAuth-based).

---

## Common Use Cases

| Scenario | Description |
|-----------|--------------|
| 🔊 **AI Narration** | Convert dynamically generated text to audio for training, documentation, or presentations. |
| 💬 **Chatbot Voice Output** | Generate speech responses for AI chat or assistant workflows. |
| 🎧 **Audio Alerts** | Play or send system notifications with voice messages. |
| 🗣️ **Multilingual Output** | Generate speech in any supported language and accent. |

---

## Status Messages

| Status | Description |
|---------|-------------|
| **Completed** | Audio synthesis completed successfully. |
| **Fail** | Error occurred (invalid credentials, empty input, or API failure). |

---

## Error Handling

The node logs detailed workflow messages in case of failure:
- Missing or invalid Google Cloud token  
- Empty text or SSML input  
- Invalid voice or audio format  
- API error or connection issue  
- Output file write error  

Check the **Workflow Log** for `[ERROR] GoogleTTS failed:` entries to diagnose issues.

---

## Example Workflow Integration

```mermaid
graph LR
    A[AI Generate Response] --> B[Google TTS]
    B --> C[Save File to SharePoint]
    B --> D[Play Audio Notification]

Category: AI & Google Cloud Task Name: GoogleTTS

Tasks

ACC

Autodesk Vault

Azure

BlueBeam

Dynamics 365

Google Cloud

Mesh

Monday

Sharepoint

Completions

Document

General

MinuteView

ElevenLabs

Open AI

AutoCAD

Excel

Inventor

Microstation

Pdf

Word

Google Text-to-Speech (GoogleTTS)

Description

How It Works

Input Fields

Output Variables

Example Output

Example Configuration

Example (Advanced SSML Mode)

Tasks

ACC

Autodesk Vault

Azure

BlueBeam

Dynamics 365

Google Cloud

Mesh

Monday

Sharepoint

Completions

Document

General

MinuteView

ElevenLabs

Open AI

AutoCAD

Excel

Inventor

Microstation

Pdf

Word

Google Text-to-Speech (GoogleTTS) ​

Description ​

How It Works ​

Input Fields ​

Output Variables ​

Example Output ​

Example Configuration ​

Example (Advanced SSML Mode) ​

Google Text-to-Speech (GoogleTTS)

Description

How It Works

Input Fields

Output Variables

Example Output

Example Configuration

Example (Advanced SSML Mode)