Skip to main content
PROThis feature is supported in the Professional Edition.
Browser-use is a browser automation task execution tool designed for Xpert agents. It encapsulates integration capabilities with the browser-use open-source framework, allowing large model Agents to call real browsers in sandbox environments to complete specified web page operation tasks automatically. This tool is suitable for task scenarios that require automatically accessing websites, extracting information, filling forms, clicking buttons, etc., through browsers, providing browser operation extension capabilities for large models with general execution capabilities.

Features

  • ✅ Supports automatic browser operations through naturally described task tasks
  • ✅ Supports custom large model and API integration configuration
  • ✅ Seamless integration with browser-use open-source framework
  • ✅ Automatically tracks and records execution process (history, recording, Trace)
  • ✅ Supports multi-step tasks combined with Agent reasoning flow
  • ✅ Supports runtime parameter configuration, such as whether to enable recording, whether to enable vision models, etc.
  • ✅ Supports use in Agentic Workflow

Usage Instructions

Tool Parameter Configuration

  • Specify browser task (provided by LLM)
  • Configure LLM model
  • Browser execution parameters (whether to record, whether to use vision model, timeout, etc.)

Communication with browser-use in Sandbox

The tool establishes an SSE stream through EventSource, communicates with the Sandbox service, and initiates browser task streaming execution to /operator/stream.

Real-time Event Reception

During task execution, the tool will listen and parse event messages sent back by browser-use:
  • Thoughts for each execution step (thoughts)
  • Current page URL
  • Whether errors occurred (errors)
  • Whether completed (messages containing done)
Parsed intermediate events will be distributed in real-time to the frontend or debugging interface for displaying execution status.

Final Result Return

When the task completes (detecting done), the tool will extract the final_result field from events as the execution result.

Return Value

Returns a string result, which is a summary or operation result description after the large model Agent executes the task.

Advanced Configuration

Configuration ItemDescription
copilotModelCurrently used LLM model and its Provider information
llm_temperatureControls the temperature of large model sampling (default 0.5)
enable_recordingWhether to enable browser recording function (enabled by default)
max_stepsMaximum number of steps for browser task execution (default 100)
use_visionWhether to enable vision recognition capabilities (such as page screenshot understanding)
timeoutTask timeout time

Application Scenario Examples

  • Search for specified content on web pages and summarize (such as news, stock prices, reports)
  • Complete multi-step interactions on complex websites (such as querying information and exporting)
  • Simulate real human web page operation processes in large Agent systems

Notes

  • This tool depends on the backend browser-use runtime environment (i.e., Sandbox), ensure it is accessible and started normally.
  • Tool return results depend on the correctness of browser task execution and model understanding accuracy.
  • Currently defaults to running browsers in headless mode.
  • The tool is designed for large models, so task expressions should clearly describe intent in natural language.