Skip to content

Google Vision

Description

The GoogleVision node connects to the Google Cloud Vision API to perform image and document analysis tasks such as Optical Character Recognition (OCR) and Object Detection.
It supports both local file OCR and Google Cloud Storage (GCS)-based document and image operations.

This node allows workflows to automatically extract text from images or PDFs, detect objects within images, and feed these results into downstream nodes like AI analyzers, Excel writers, or document classifiers.


How It Works

  1. Validates the action type (Image OCR, Document OCR, or Object Detection).
  2. Retrieves an OAuth access token from a configured Google Cloud third-party credential.
  3. Invokes the appropriate Google Vision API endpoint depending on the selected action.
  4. Returns extracted text, structured detection data, or GCS output references.
  5. Logs the operation and makes results available for other workflow steps.

Supported Actions

ActionDescription
Image OCR (local file)Extracts text from a local image (PNG, JPG, BMP, etc.) using Google Vision OCR.
Document OCR (GCS pdf/tif)Processes multi-page PDF or TIFF documents stored in Google Cloud Storage and exports text to a GCS bucket.
Object Detection (GCS image)Detects and labels objects within a GCS-hosted image, including confidence scores.

Input Fields

FieldTypeDescriptionRequired
ThirdParty - Google CloudThird-Party TokenReference to a valid Google Cloud OAuth credential stored in MinuteView.
ActionPicklistThe analysis type to perform: Image OCR (local file), Document OCR (GCS pdf/tif), or Object Detection (GCS image).

When Action = "Image OCR (local file)"

FieldTypeDescriptionRequired
Local Image PathTextFull path to the image file on the automation server.

When Action = "Document OCR (GCS pdf/tif)"

FieldTypeDescriptionRequired
GCS Source URITextSource file URI (e.g. gs://bucket/path/document.pdf).
GCS Destination BucketTextDestination bucket where OCR JSON will be written.
GCS Destination PrefixTextDestination prefix (folder) for OCR results.
MIME Type OverrideText(Optional) Custom MIME type for special formats.

When Action = "Object Detection (GCS image)"

FieldTypeDescriptionRequired
GCS Image URITextURI of the image stored in GCS (e.g. gs://bucket/images/photo.jpg).
Confidence ThresholdNumberMinimum detection confidence between 0.0 and 1.0. Default = 0.0.
Max ResultsNumberMaximum number of detected objects to return. Default = all.

Output Data

Output VariableTypeDescription
outObject / StringThe raw or structured output of the selected action.
taskMessageStringMessage describing the outcome.
statusReturnStringCompleted on success or Fail on error.

Example Outputs

🧾 Image OCR (local file)

json
{
  "out": "Valve No. 204-B\nPressure Rating: 25 bar\nLast Service: 2024-08-05",
  "taskMessage": "Image OCR (local file) completed successfully",
  "statusReturn": "Completed"
}

📑 Document OCR (GCS pdf/tif)

json
{
  "out": {
    "Source": "gs://engineering-docs/invoices/invoice123.pdf",
    "Output": "gs://engineering-docs/ocr-results/invoice123/",
    "Status": "QueuedOrCompleted",
    "Result": "Operation-123456789"
  },
  "taskMessage": "Document OCR request submitted successfully",
  "statusReturn": "Completed"
}

🧠 Object Detection (GCS image)

json
{
  "out": {
    "Image": "gs://project-assets/inspection/site_photo.jpg",
    "MinScore": 0.6,
    "MaxResults": 10,
    "Objects": [
      { "name": "Hardhat", "score": 0.92 },
      { "name": "Person", "score": 0.87 },
      { "name": "Excavator", "score": 0.85 }
    ]
  },
  "taskMessage": "Object Detection completed successfully",
  "statusReturn": "Completed"
}

Example Workflow Use Cases

ScenarioDescription
🔎 Drawing OCRExtract text and dimensions from scanned PDFs or TIFF drawings.
📄 Document DigitizationRead legacy engineering documents and export OCR text into databases.
🧰 Object RecognitionAutomatically tag and classify images (e.g., identify equipment, safety gear, or site conditions).
🧾 Invoice OCR PipelineRead invoice PDFs from a GCS bucket, parse text via OCR, and load results into SharePoint or SQL.

Task Flow Summary

StepAction
1Validates the selected action type.
2Retrieves Google Cloud third-party credentials.
3Executes the appropriate Vision API function.
4Processes and filters the response.
5Returns structured results or GCS reference URIs.

Notes

  • The node supports both local file-based and Google Cloud Storage-based operations.
  • Returned data can be passed into subsequent workflow nodes for analysis, classification, or AI processing.
  • Document OCR operations typically write their JSON results to the GCS destination path.
  • Object detection output can be filtered using Confidence Threshold and Max Results.
  • The node relies on the Google Cloud Vision API under your provided credentials and permissions.

Error Handling

If the task fails, a clear error message will be logged and returned. Common error causes include:

  • Missing or invalid Google Cloud token
  • Incorrect GCS bucket URI or permissions
  • Unsupported file format
  • Missing required fields (e.g., file path or GCS prefix)
  • API or network error from Google Cloud services

Example Workflow Integration

mermaid
graph LR
    A[Get File From SharePoint] --> B[GoogleVision (Image OCR)]
    B --> C[Extract Keywords]
    C --> D[Add Metadata to Vault]

Category: AI & Google Cloud Task Name: GoogleVision

Tentech 2024