Browser-use is a browser automation task execution tool designed for Xpert agents. It encapsulates integration capabilities with the browser-use open-source framework, allowing large model Agents to call real browsers in sandbox environments to complete specified web page operation tasks automatically. This tool is suitable for task scenarios that require automatically accessing websites, extracting information, filling forms, clicking buttons, etc., through browsers, providing browser operation extension capabilities for large models with general execution capabilities.Documentation Index
Fetch the complete documentation index at: https://docs.xpertai.cn/llms.txt
Use this file to discover all available pages before exploring further.
Features
- ✅ Supports automatic browser operations through naturally described
tasktasks - ✅ Supports custom large model and API integration configuration
- ✅ Seamless integration with
browser-useopen-source framework - ✅ Automatically tracks and records execution process (history, recording, Trace)
- ✅ Supports multi-step tasks combined with Agent reasoning flow
- ✅ Supports runtime parameter configuration, such as whether to enable recording, whether to enable vision models, etc.
- ✅ Supports use in Agentic Workflow
Usage Instructions
Tool Parameter Configuration
- Specify browser task (provided by LLM)
- Configure LLM model
- Browser execution parameters (whether to record, whether to use vision model, timeout, etc.)
Communication with browser-use in Sandbox
The tool establishes an SSE stream throughEventSource, communicates with the Sandbox service, and initiates browser task streaming execution to /operator/stream.
Real-time Event Reception
During task execution, the tool will listen and parse event messages sent back by browser-use:- Thoughts for each execution step (
thoughts) - Current page URL
- Whether errors occurred (
errors) - Whether completed (messages containing
done)
Final Result Return
When the task completes (detectingdone), the tool will extract the final_result field from events as the execution result.
Return Value
Returns a string result, which is a summary or operation result description after the large model Agent executes the task.Advanced Configuration
| Configuration Item | Description |
|---|---|
copilotModel | Currently used LLM model and its Provider information |
llm_temperature | Controls the temperature of large model sampling (default 0.5) |
enable_recording | Whether to enable browser recording function (enabled by default) |
max_steps | Maximum number of steps for browser task execution (default 100) |
use_vision | Whether to enable vision recognition capabilities (such as page screenshot understanding) |
timeout | Task timeout time |
Application Scenario Examples
- Search for specified content on web pages and summarize (such as news, stock prices, reports)
- Complete multi-step interactions on complex websites (such as querying information and exporting)
- Simulate real human web page operation processes in large Agent systems
Notes
- This tool depends on the backend
browser-useruntime environment (i.e., Sandbox), ensure it is accessible and started normally. - Tool return results depend on the correctness of browser task execution and model understanding accuracy.
- Currently defaults to running browsers in
headlessmode. - The tool is designed for large models, so
taskexpressions should clearly describe intent in natural language.