1. Core Features
1. Multi-Source Integration
The Document Source Node supports integration with multiple external data sources through a Plugin Strategy. Each data source type exists as an independent plugin, which can be developed by XpertAI or third-party developers. Common data sources include:- File Upload: Local files such as PDF, Word, TXT, Markdown, etc.
- Cloud Storage: Services like Google Drive, OneDrive, Alibaba Cloud Drive, etc.
- Online Documents: Platforms like Notion, Feishu Docs, Confluence.
- Web Scraping: Tools like Firecrawl, Jina Reader.
- API Data Sources: Fetching structured or semi-structured content via external system APIs.
2. Intelligent Document Loading & Task Management
When the pipeline reaches the Document Source Node, the system will, according to the node configuration:- Automatically invoke the corresponding plugin to load documents;
- Generate a unique knowledge document object for each document;
- Attach it to the current Knowledge Task for unified management.
3. Seamless Integration with Knowledge Base
The output of the Document Source Node is directly passed to subsequent nodes in the knowledge pipeline (such as document conversion, content extraction, index building, etc.). Between nodes, document information is transmitted in a standardized data structure (includingmetadata, pageContent, mimeType, etc.), ensuring compatibility and extensibility across nodes.
4. Plugin-Based Strategy Extension
Each document source type corresponds to an independent Document Source Strategy, dynamically loaded by the plugin system. Plugins can define:- Configuration parameters for the source (such as API endpoint, authentication method, file path, etc.);
- Document extraction logic (including pagination, content segmentation, metadata parsing);
- Authorization rules and integration permissions.
5. Error Handling & Workflow Control
The Document Source Node features comprehensive exception handling and status update mechanisms during document loading:- If data source connection fails or document parsing errors occur, errors are automatically logged and the task is marked as “failed”;
- The pipeline can automatically switch to a fallback path (such as a Fail branch) based on error status;
- Supports error retries and manual intervention to ensure stable task execution.
2. Typical Use Cases
-
Enterprise Knowledge Aggregation Center
- Regularly fetch documents from various business systems (OA, CRM, ERP);
- Automatically import into the knowledge base to build a unified internal knowledge source.
-
AI Document Q&A System
- Periodically sync with external knowledge bases like Confluence, Notion;
- Automatically extract content for ChatBI or Copilot to perform knowledge Q&A.
-
Compliance & Archive Auditing
- Automatically obtain PDF contracts and approval documents from cloud drives or contract systems;
- Convert them into standardized document formats for auditing and AI-assisted analysis.
-
Website Content Aggregation & Summary Generation
- Regularly crawl news, announcements, or blog content via web scraping plugins;
- Generate summaries, tags, or classification indexes with downstream nodes.
3. Advantages & Value
| Feature | Description |
|---|---|
| Scalability | Quickly integrate any new data source via plugins |
| Security | Supports both environment variable and system integration authorization |
| Automation | Periodically sync documents without manual intervention |
| Standardization | Unified document structure for easy conversion and indexing |
| Flexibility | Combine with other nodes to build complex knowledge pipelines |