Skip to content

Mesh Indexer Sharepoint

The Mesh Indexer Sharepoint node indexes documents and metadata from a Microsoft SharePoint Document Library into the Mesh platform using a configurable indexer pipeline. This task allows selective or full indexing of SharePoint documents, including support for full content extraction, text vectorization (embeddings), OCR, and thumbnails.


Purpose

This node enables users to connect to a SharePoint Online source and index records into an ElasticSearch-backed Mesh instance. It supports full indexing, incremental updates based on a date filter, or indexing of a single specified document.


Required Configuration

The modal form includes the following input fields to configure the node:

Required Inputs

LabelDescription
Select IndexThe name of the index to target (e.g., documents-sharepoint).
Index ActionChoose the indexing mode:
- Full Index
- Update From Date
- Update Single Record
ServiceAccount-OpenAIThe OpenAI service account token (used for embeddings).
ServiceAccount-Elastic SearchThe ElasticSearch service account token.
ServiceAccount-Microsoft AzureThe Microsoft Azure (SharePoint) service account token.

Optional Inputs

LabelDescription
Include Text VectorizationEnable text vector embedding using OpenAI.
Include Full Text ContentExtract and include full content from documents.
Include ThumbnailInclude a thumbnail preview of the document (if available).
Include OCRApply Optical Character Recognition (OCR) to scanned documents.
Document ID(Used in Update Single Record mode) Specify the SharePoint document ID to reindex.
Start Date(Used in Update From Date mode) Only index documents modified after this date.
White List FoldersOptional list of folder paths to restrict indexing scope.
Record TitleOptional format string for the record title (e.g., "{FileName} - {Modified}").
Sub TitleOptional format string for the subtitle/description.
DomainSharePoint tenant domain (e.g., yourtenant.sharepoint.com).
Site NameThe name of the SharePoint Site (e.g., project-site).
Document LibraryThe specific SharePoint library name (e.g., Documents, Shared Documents).

Execution Logic

  1. Validation – Required fields such as index action and service accounts are validated.
  2. Service Accounts – Retrieves and validates tokens for OpenAI, ElasticSearch, and Microsoft Azure.
  3. Data Source Initialization – Creates a SPDataSource object for SharePoint access.
  4. Settings Compilation – Constructs indexing settings (e.g., include embeddings, OCR, etc.).
  5. Query Construction – Builds an ItemQuery to filter the SharePoint content based on mode and inputs.
  6. Indexing Process – Initializes the ItemIndexer, processes the SharePoint items, and pushes the data into the Mesh index.
  7. Progress Logging – Outputs progress to logs including number of records processed.
  8. Completion – Returns a success or failure status based on the outcome.

Output

On success:

  • statusReturn: Completed
  • taskMessage: Indexing Completed Successfully
  • taskSuccess: true

On failure:

  • statusReturn: Fail
  • taskMessage: Includes error message
  • taskSuccess: false

Notes

  • Ensure the configured service accounts have appropriate API permissions:

    • SharePoint: App access to read site and document library contents.
    • ElasticSearch: Index write permissions.
    • OpenAI (optional): API key for embedding generation.
  • This node is optimized for SharePoint Online, not on-premises deployments.

  • OCR may significantly increase processing time for image-based or scanned documents.


Tentech 2024