- Convert documents of different formats into unified, structured knowledge documents;
- Perform semantic parsing, OCR recognition, text extraction, and metadata enhancement;
- Use plugin strategies to adaptively handle various file types and sources;
- Support both test mode (preview) and production mode (official release) transformation workflows;
How It Works
When the knowledge pipeline reaches the document transformation node, the system will automatically:- Read the document data output from upstream nodes (such as file upload nodes, web crawler nodes, or external integrated data sources);
- Select the appropriate Document Processing Strategy (Transformer Strategy) based on the document transformation plugins configured in the knowledge base;
-
Execute the transformation logic according to node configuration, including:
- Text extraction
- Optical Character Recognition (OCR)
- Image content recognition and description generation (VLM / multimodal processing)
- Format normalization (Markdown, JSON, HTML, etc.)
- Metadata generation and cleansing
- Write the transformation results back to the knowledge base, generating a document version with the status “Transformed”;
- If the transformation fails, automatically record the error status and information for tracking and recovery in the pipeline.
Use Cases
The Document Transformer node has broad application value in enterprise knowledge management and intelligent document processing scenarios:- OCR Scenarios: Automatically recognize text from scanned PDFs, image reports, and scanned paper documents;
- Structure Extraction: Extract main content, titles, tables, and other key elements from Word, PPT, and HTML pages;
- Rich Media Processing: Recognize and transcribe images, charts, or formulas in documents for semantic search;
- Web and Knowledge Source Synchronization: Perform structured synchronization of web pages and online documents (such as Feishu Docs, Notion, etc.);
- Content Cleansing and Enhancement: Perform regex cleansing, semantic annotation, named entity recognition, etc., on text;
- Data Archiving: Form standardized, traceable knowledge assets in the knowledge base for RAG retrieval or multi-agent Q&A.
Node Features
| Feature | Description |
|---|---|
| Node Type | Processor (Processing Node) |
| Node Name | Document Transformer |
| Input | Raw document objects output from upstream nodes |
| Output | Transformed document objects (structured docs + metadata + chunks) |
| Status Update | Automatically updates document status to TRANSFORMED or ERROR |
| Test Mode | Supports conversion debugging in preview mode (no DB write) |
| Error Handling | Automatically captures conversion errors and writes to task logs |
| Compatibility | Supports multiple plugin strategies (text, image, web, rich media, etc.) |
Plugin Mechanism
The power of the Document Transformer node lies in its plugin-based architecture. XpertAI provides an open document transformation plugin interface (DocumentTransformerStrategy), allowing both official and community developers to implement the following types of plugins:
- 📄 General Text Transformation Plugins: Handle common documents such as PDF, DOCX, TXT, etc.;
- 🌐 Web Parsing Plugins: Structure web content into knowledge documents;
- 🧠 Multimodal Recognition Plugins: Use vision-language models (such as GPT-4V, PaddleOCR, MinerU) to understand images and charts;
- 🧩 Enterprise Integration Plugins: Connect to platforms like Feishu Docs, Notion, SharePoint, Confluence, etc.;
- ⚙️ Custom Script Plugins: User-written logic for content cleansing, format conversion, field extraction.
- API Key authorization;
- OAuth application authorization;
- Temporary file directory (
tempDir) isolation; - File system and external integration permission control (e.g., access to cloud drives, remote document libraries, etc.).
Operation Modes & Stages
The Document Transformer node supports two operation modes:| Mode | Description |
|---|---|
| Test Mode | Executes during pipeline debugging, does not save results, for preview only. |
| Production Mode | Executes during official knowledge base updates, results are persisted to the database for indexing and retrieval. |
- Whether to write to the knowledge base table structure;
- Whether to trigger indexing and task status updates;
- Whether to generate persistent chunk data for AI retrieval.
Transformation Results & Downstream Integration
After transformation, the node outputs standardized knowledge document objects:- Each document contains structured
metadatainformation; - Includes
chunksprocessed for vector indexing and semantic retrieval; - Status is
TRANSFORMED; - Also outputs an
Errorchannel to capture exceptions.