Skip to content

Google Vision ​

Performs image and document analysis using the Google Cloud Vision API, supporting OCR on local image files, asynchronous document OCR from Google Cloud Storage, and object detection from GCS-hosted images.

Purpose ​

Use this task when a workflow needs to extract text from scanned images or multi-page documents, or to detect and label objects within images. The action picker selects the specific Vision capability to invoke. Local image OCR requires only a file path, while the GCS-based modes require source and destination URI inputs. All modes require a Google Cloud Platform third-party account to be configured. Output variables differ by action and are listed below.

Inputs ​

FieldTypeRequiredDescription
ActionDropdownYesThe Vision operation to perform. Options: Image OCR (local file), Document OCR (GCS pdf/tif), Object Detection (GCS image).
Local Image PathTextNoThe absolute local path to the image file to process with OCR. Supported formats include PNG, JPG, and BMP.
GCS Source URITextNoThe GCS URI of the PDF or TIFF document to process, for example gs://my-bucket/docs/file.pdf.
GCS Destination BucketTextNoThe name of the GCS bucket where OCR result JSON will be written.
GCS Destination PrefixTextNoThe folder prefix within the destination bucket for OCR output files.
MIME Type OverrideTextNoAn optional MIME type to use when the file extension is ambiguous, for example application/pdf or image/tiff.
GCS Image URITextNoThe GCS URI of the image to analyse for object detection, for example gs://my-bucket/images/photo.jpg.
Confidence ThresholdTextNoThe minimum detection confidence score to include in results. Accepts a value between 0.0 and 1.0. Defaults to 0.0.
Max ResultsTextNoThe maximum number of detected objects to return. Defaults to all results.

Visibility Rules ​

Local Image Path is only shown when Action is set to Image OCR (local file).

GCS Source URI is only shown when Action is set to Document OCR (GCS pdf/tif).

GCS Destination Bucket is only shown when Action is set to Document OCR (GCS pdf/tif).

GCS Destination Prefix is only shown when Action is set to Document OCR (GCS pdf/tif).

MIME Type Override is only shown when Action is set to Document OCR (GCS pdf/tif).

GCS Image URI is only shown when Action is set to Object Detection (GCS image).

Confidence Threshold is only shown when Action is set to Object Detection (GCS image).

Max Results is only shown when Action is set to Object Detection (GCS image).

Operations ​

OperationDescription
Image OCR (local file)Reads a local image file, encodes it, and submits it to the Vision API to extract all text. Returns the extracted text as a string.
Document OCR (GCS pdf/tif)Submits a multi-page PDF or TIFF stored in GCS to the Vision API for asynchronous document OCR. Google writes JSON result files to the specified destination bucket and prefix.
Object Detection (GCS image)Analyses a GCS-hosted image for recognisable objects and returns a list of detected items with confidence scores, filtered by the optional threshold and result count limit.

Outputs ​

The output variables produced depend on the selected action.

For Image OCR (local file):

NameDescription
OCR TextThe full text extracted from the image by the Vision API.
Source ImageThe local image path that was processed.
ActionThe action that was performed.

For Document OCR (GCS pdf/tif):

NameDescription
Source URIThe GCS source URI of the document that was processed.
Output URIThe GCS URI prefix where OCR result JSON files were written.
Destination BucketThe destination bucket name.
Destination PrefixThe destination prefix used for OCR output.
ResultThe raw result or operation reference returned by the Vision API.
ActionThe action that was performed.

For Object Detection (GCS image):

NameDescription
Image URIThe GCS URI of the image that was analysed.
Min ScoreThe confidence threshold that was applied to filter results.
Max ResultsThe maximum result count that was applied, or Unlimited if not set.
Detected ObjectsA list of detected objects with their labels and confidence scores.
Object CountThe total number of objects returned after filtering.
ActionThe action that was performed.

Tentech