Lark Document Transformation Strategy

Code Review

import { Injectable } from '@nestjs/common'
import { Document } from '@langchain/core/documents'
import {
  DocumentTransformerStrategy,
  IDocumentTransformerStrategy,
  IntegrationPermission,
  TDocumentTransformerConfig,
} from '@xpert-ai/plugin-sdk'
import { IconType, IKnowledgeDocument } from '@metad/contracts'
import { iconImage, LarkDocumentMetadata, LarkDocumentName, LarkName } from './types.js'
import { LarkClient } from './lark.client.js'

@Injectable()
@DocumentTransformerStrategy(LarkDocumentName)
export class LarkDocTransformerStrategy implements IDocumentTransformerStrategy<TDocumentTransformerConfig> {

  readonly permissions = [
    {
      type: 'integration',
      service: LarkName,
      description: 'Access to Lark system integrations'
    } as IntegrationPermission,
  ]

  readonly meta = {
    name: LarkDocumentName,
    label: {
      en_US: 'Lark Document',
      zh_Hans: '飞书文档'
    },
    description: {
      en_US: 'Load content from Lark documents',
      zh_Hans: '加载飞书文档内容'
    },
    icon: {
      type: 'image' as IconType,
      value: iconImage,
      color: '#14b8a6'
    },
    helpUrl: 'https://open.feishu.cn/document/server-docs/docs/docs-overview',
    configSchema: {
      type: 'object',
      properties: {},
      required: []
    }
  }

  validateConfig(config: any): Promise<void> {
    throw new Error('Method not implemented.')
  }

  async transformDocuments(
    files: Partial<IKnowledgeDocument<LarkDocumentMetadata>>[],
    config: TDocumentTransformerConfig
  ): Promise<Partial<IKnowledgeDocument<LarkDocumentMetadata>>[]> {
    const integration = config?.permissions?.integration
    if (!integration) {
      throw new Error('Integration system is required')
    }

    console.log('LarkDocTransformerStrategy transformDocuments', files, config)

    const client = new LarkClient(integration)
    
    const results: Partial<IKnowledgeDocument<LarkDocumentMetadata>>[] = []
    for await (const file of files) {
      const content = await client.getDocumentContent(file.metadata.token)
      results.push({
        id: file.id,
        chunks: [
          new Document({
            id: file.id,
            pageContent: content,
            metadata: {
              chunkId: file.id,
              source: LarkName,
              sourceId: file.id
            }
          })
        ],
        metadata: {
          assets: []
        } as LarkDocumentMetadata
      })
    }
    return results
  }
}

Logic Breakdown

1. Decorators and Dependency Injection

@Injectable()
@DocumentTransformerStrategy(LarkDocumentName)

@Injectable(): NestJS dependency injection decorator, marks this as an injectable service.
@DocumentTransformerStrategy(LarkDocumentName): Registers the class as a document transformation strategy with the unique name LarkDocumentName. 👉 This allows the system to automatically recognize and use this strategy.

2. Permission Definition

readonly permissions = [
  {
    type: 'integration',
    service: LarkName,
    description: 'Access to Lark system integrations'
  } as IntegrationPermission,
]

The plugin requires Lark integration permission to call the API and fetch documents.
IntegrationPermission declares the dependent service, here it’s LarkName (Lark).

3. Metadata (meta)

readonly meta = {
  name: LarkDocumentName,
  label: {
    en_US: 'Lark Document',
    zh_Hans: '飞书文档'
  },
  description: {
    en_US: 'Load content from Lark documents',
    zh_Hans: '加载飞书文档内容'
  },
  icon: {
    type: 'image' as IconType,
    value: iconImage,
    color: '#14b8a6'
  },
  helpUrl: 'https://open.feishu.cn/document/server-docs/docs/docs-overview',
  configSchema: { ... }
}

Plugin UI display info: name, icon, description, help documentation link.
configSchema: Defines configuration options (empty here, meaning no extra parameters required).

4. Configuration Validation

validateConfig(config: any): Promise<void> {
  throw new Error('Method not implemented.')
}

Placeholder method for future configuration validation.
For example: check if document ID or token is provided.

5. Core Document Transformation Logic

async transformDocuments(
  files: Partial<IKnowledgeDocument<LarkDocumentMetadata>>[],
  config: TDocumentTransformerConfig
): Promise<Partial<IKnowledgeDocument<LarkDocumentMetadata>>[]> {
  const integration = config?.permissions?.integration
  if (!integration) {
    throw new Error('Integration system is required')
  }

  const client = new LarkClient(integration)
  
  const results: Partial<IKnowledgeDocument<LarkDocumentMetadata>>[] = []
  for await (const file of files) {
    const content = await client.getDocumentContent(file.metadata.token)
    results.push({
      id: file.id,
      chunks: [
        new Document({
          id: file.id,
          pageContent: content,
          metadata: {
            chunkId: file.id,
            source: LarkName,
            sourceId: file.id
          }
        })
      ],
      metadata: {
        assets: []
      } as LarkDocumentMetadata
    })
  }
  return results
}

Line-by-line explanation:

Get Integration Info

const integration = config?.permissions?.integration
if (!integration) throw new Error('Integration system is required')

Retrieves Lark integration credentials from config.
Throws error if credentials are missing.

Initialize Client
```
const client = new LarkClient(integration)
```
- Constructs LarkClient with credentials to access Lark API.
Process Files in a Loop
```
for await (const file of files) {
  const content = await client.getDocumentContent(file.metadata.token)
}
```
- Iterates over the list of documents to process.
- Calls client.getDocumentContent to fetch document content by token.

Build Transformed Document

results.push({
  id: file.id,
  chunks: [
    new Document({
      id: file.id,
      pageContent: content,
      metadata: {
        chunkId: file.id,
        source: LarkName,
        sourceId: file.id
      }
    })
  ],
  metadata: {
    assets: []
  } as LarkDocumentMetadata
})

Each Lark document is converted to an IKnowledgeDocument.
Main content is placed in the chunks array.
metadata stores extra info (currently only assets).

Overall Execution Flow

Input: A batch of Lark document metadata (file ID / token).
Permission Validation: Ensure Lark integration config is present.
API Call: Use LarkClient to fetch the content of each document.
Transform to Knowledge Base Format:
- Wrap as IKnowledgeDocument
- Content is chunked into Document (for later vectorization)
Output: Returns an array of documents usable by Xpert AI Knowledge Base.

Core Value

Decoupling: The strategy class does not call the API directly, but relies on LarkClient.
Generality: All documents are ultimately converted to IKnowledgeDocument, seamlessly integrating with the platform’s knowledge base.
Extensibility: In the future, you can add to transformDocuments:
- Text cleaning (remove empty lines/formatting)
- Content chunking
- Metadata enhancement (author, tags, update time)

Default

Feishu Document Example

Lark Document Transformation Strategy

Code Review

Logic Breakdown

1. Decorators and Dependency Injection

2. Permission Definition

3. Metadata (meta)

4. Configuration Validation

5. Core Document Transformation Logic

Overall Execution Flow

Core Value

Default

Feishu Document Example

​Code Review

​Logic Breakdown

​1. Decorators and Dependency Injection

​2. Permission Definition

​3. Metadata (meta)

​4. Configuration Validation

​5. Core Document Transformation Logic

​Overall Execution Flow

​Core Value

Code Review

Logic Breakdown

1. Decorators and Dependency Injection

2. Permission Definition

3. Metadata (meta)

4. Configuration Validation

5. Core Document Transformation Logic

Overall Execution Flow

Core Value