Skip to content

Commit 39ee4a8

Browse files
authored
feat(gui-agent): add example for 2.0 version (#1732)
1 parent e71fce1 commit 39ee4a8

File tree

6 files changed

+490
-0
lines changed

6 files changed

+490
-0
lines changed

examples/gui-agent-2.0/README.md

Whitespace-only changes.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
{
2+
"name": "gui-agent-example",
3+
"version": "0.0.1",
4+
"description": "Example for GUI Agent SDK",
5+
"main": "dist/index.js",
6+
"scripts": {
7+
"build": "tsc",
8+
"start": "node dist/index.js",
9+
"dev": "tsx src/index.ts",
10+
"clean": "rm -rf dist"
11+
},
12+
"keywords": [
13+
"gui-agent",
14+
"example",
15+
"cli",
16+
"typescript"
17+
],
18+
"author": "UI-TARS Team",
19+
"license": "Apache-2.0",
20+
"dependencies": {
21+
"dotenv": "^16.4.7",
22+
"@gui-agent/action-parser": "0.3.0-beta.12-canary-41702d44a-20250928105656",
23+
"@gui-agent/agent-sdk": "0.3.0-beta.12-canary-41702d44a-20250928105656",
24+
"@gui-agent/operator-aio": "0.3.0-beta.12-canary-41702d44a-20250928105656"
25+
},
26+
"devDependencies": {
27+
"@types/inquirer": "^9.0.7",
28+
"@types/node": "^20.8.0",
29+
"tsx": "^4.0.0",
30+
"typescript": "^5.2.2"
31+
},
32+
"pnpm": {
33+
"overrides": {
34+
"ini": "^5.0.0"
35+
}
36+
}
37+
}
Lines changed: 357 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,357 @@
1+
# GUI Agent Standalone Project - Quick Start for LLM Agents
2+
3+
## PROJECT OVERVIEW
4+
- **Project Type**: TypeScript Node.js application using GUI Agent SDK
5+
- **Module System**: CommonJS (NOT ES Modules)
6+
- **Build Tool**: TypeScript Compiler (tsc)
7+
- **Package Manager**: npm/pnpm
8+
- **Runtime**: Node.js v22.17.1+
9+
10+
## CRITICAL TECHNICAL CONSTRAINTS
11+
12+
### 1. MODULE SYSTEM REQUIREMENTS
13+
- **MUST USE CommonJS**: `package.json` does NOT contain `"type": "module"`
14+
- **TypeScript Config**: `"module": "CommonJS"` in `tsconfig.json`
15+
- **Import Syntax**: Use ES6 imports in TypeScript, compiles to CommonJS require()
16+
- **File Extensions**: NO `.js` extensions needed in TypeScript imports
17+
- **__dirname**: Available in CommonJS (NOT available in ES modules)
18+
19+
### 2. TYPE SYSTEM CONSTRAINTS
20+
- **Avoid Type Imports**: Do NOT import `ModelProviderName` or `AgentModel` types
21+
- **Use `as const` Assertion**: For string literals that need specific types
22+
- **Example**: `provider: 'volcengine' as const` (NOT `provider: 'volcengine'`)
23+
- **Reason**: Prevents string type widening, ensures literal type compatibility
24+
25+
### 3. ENVIRONMENT CONFIGURATION
26+
- **File**: `.env.local` (copy from `.env.local.example`)
27+
- **Loading**: Uses `dotenv` with `path.join(__dirname, '..', '.env.local')`
28+
29+
#### Required Environment Variables
30+
```bash
31+
# Model Service Configuration
32+
ARK_BASE_URL=https://your-model-service-url # Model Service API endpoint
33+
ARK_API_KEY=your-actual-model-service-api-key # Your Model Service API key
34+
35+
# Doubao Models Configuration
36+
DOUBAO_1_5_VP=your-model-key-abcdef # Doubao 1.5 VP model endpoint ID
37+
DOUBAO_SEED_1_6=your-model-key-fedcba # Doubao Seed 1.6 model endpoint ID
38+
39+
# AIO Sandbox Configuration
40+
SANDBOX_URL=http://your-sandbox-url:port # AIO operator sandbox URL
41+
```
42+
43+
#### Environment Variable Details
44+
- **ARK_BASE_URL**: Model service base URL
45+
- **ARK_API_KEY**: Your Volcengine account API key for authentication
46+
- **DOUBAO_SEED_1_6**: Model endpoint ID for Doubao Seed 1.6, format: `ep-{timestamp}-{hash}`
47+
- **DOUBAO_1_5_VP**: Model endpoint ID for Doubao 1.5 VP (optional, not used in current code)
48+
- **SANDBOX_URL**: URL of the AIO operator sandbox environment for GUI operations
49+
50+
#### Environment Setup Process
51+
1. Copy `.env.local.example` to `.env.local`
52+
2. Replace all `your-*` placeholders with actual values
53+
3. Ensure ARK_API_KEY has proper permissions for model access
54+
4. Verify SANDBOX_URL is accessible and running
55+
5. Model endpoint IDs must be valid and active in your Volcengine account
56+
57+
## PROJECT STRUCTURE
58+
```
59+
gui-agent-standalone/
60+
├── src/
61+
│ ├── index.ts # Main entry point
62+
│ └── constants.ts # System prompt definition
63+
├── dist/ # Compiled JavaScript output
64+
├── package.json # Dependencies and scripts
65+
├── tsconfig.json # TypeScript configuration
66+
├── .env.local.example # Environment template
67+
└── .env.local # Actual environment (create manually)
68+
```
69+
70+
## DEPENDENCIES
71+
### Runtime Dependencies
72+
- `dotenv`: Environment variable loading
73+
- `@gui-agent/agent-sdk`: Core GUI agent functionality
74+
- `@gui-agent/operator-aio`: AIO hybrid operator
75+
- `@gui-agent/action-parser`: Action parsing utilities
76+
77+
### Development Dependencies
78+
- `typescript`: TypeScript compiler
79+
- `tsx`: TypeScript execution for development
80+
- `@types/node`: Node.js type definitions
81+
82+
## BUILD AND RUN PROCESS
83+
84+
### 1. MANDATORY BUILD STEP
85+
```bash
86+
npm run build # Compiles TypeScript to dist/
87+
```
88+
- **NEVER skip this step** - project requires compilation
89+
- **Output**: `dist/index.js` and `dist/constants.js`
90+
91+
### 2. EXECUTION
92+
```bash
93+
npm start # Runs compiled JavaScript
94+
# OR for development
95+
npm run dev # Direct TypeScript execution
96+
```
97+
98+
## CODE STRUCTURE PATTERNS
99+
100+
### 1. Model Configuration
101+
```typescript
102+
const doubao = {
103+
id: process.env.DOUBAO_SEED_1_6!,
104+
provider: 'volcengine' as const, // CRITICAL: as const assertion
105+
baseURL: process.env.ARK_BASE_URL!,
106+
apiKey: process.env.ARK_API_KEY!,
107+
};
108+
```
109+
110+
### 2. Operator Setup
111+
```typescript
112+
const operator = new AIOHybridOperator({
113+
baseURL: process.env.SANDBOX_URL!,
114+
timeout: 10000,
115+
});
116+
```
117+
118+
### 3. Agent Initialization
119+
```typescript
120+
const guiAgent = new GUIAgent({
121+
operator,
122+
model: doubao, // No type assertion needed with as const
123+
systemPrompt: SYSTEM_PROMPT,
124+
});
125+
```
126+
127+
### 4. Execution Pattern
128+
```typescript
129+
async function main() {
130+
const response = await guiAgent.run({
131+
input: [{ type: 'text', text: 'your-task-here' }],
132+
});
133+
console.log(response.content);
134+
}
135+
```
136+
137+
## COMMON ISSUES AND SOLUTIONS
138+
139+
### Issue 1: ERR_MODULE_NOT_FOUND
140+
- **Cause**: Missing build step or ES module configuration
141+
- **Solution**: Run `npm run build` first, ensure CommonJS config
142+
143+
### Issue 2: Type Assignment Error
144+
- **Error**: Cannot assign string to ModelProviderName
145+
- **Solution**: Use `as const` assertion on provider field
146+
- **Wrong**: `provider: 'volcengine'`
147+
- **Correct**: `provider: 'volcengine' as const`
148+
149+
### Issue 3: __dirname Not Defined
150+
- **Cause**: ES module configuration
151+
- **Solution**: Remove `"type": "module"` from package.json
152+
153+
### Issue 4: Environment Variables Not Loaded
154+
- **Cause**: Missing .env.local file
155+
- **Solution**: Copy .env.local.example to .env.local and fill values
156+
157+
## TYPESCRIPT CONFIGURATION DETAILS
158+
159+
### tsconfig.json Requirements
160+
```json
161+
{
162+
"compilerOptions": {
163+
"target": "ES2022",
164+
"module": "CommonJS", // CRITICAL: Must be CommonJS
165+
"moduleResolution": "node", // CRITICAL: Must be node
166+
"outDir": "./dist",
167+
"rootDir": "./src",
168+
"esModuleInterop": true,
169+
"strict": true
170+
}
171+
}
172+
```
173+
174+
### package.json Requirements
175+
```json
176+
{
177+
"main": "dist/index.js",
178+
// NO "type": "module" field
179+
"scripts": {
180+
"build": "tsc",
181+
"start": "node dist/index.js",
182+
"dev": "tsx src/index.ts",
183+
"clean": "rm -rf dist"
184+
}
185+
}
186+
```
187+
188+
## SYSTEM PROMPT CONFIGURATION
189+
- **Location**: `src/constants.ts`
190+
- **Export**: `SYSTEM_PROMPT` constant
191+
- **Content**: GUI agent instructions with action space definitions
192+
193+
### Complete SYSTEM_PROMPT Content
194+
```typescript
195+
export const SYSTEM_PROMPT = `
196+
You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.
197+
198+
## Output Format
199+
\`\`\`
200+
Thought: ...
201+
Action: ...
202+
\`\`\`
203+
204+
## Action Space
205+
206+
navigate(url='xxx') # The url to navigate to
207+
navigate_back() # Navigate back to the previous page.
208+
click(point='<point>x1 y1</point>')
209+
left_double(point='<point>x1 y1</point>')
210+
right_single(point='<point>x1 y1</point>')
211+
drag(start_point='<point>x1 y1</point>', end_point='<point>x2 y2</point>')
212+
hotkey(key='ctrl c') # Split keys with a space and use lowercase. Also, do not use more than 3 keys in one hotkey action.
213+
type(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format. If you want to submit your input, use \\n at the end of content.
214+
scroll(point='<point>x1 y1</point>', direction='down or up or right or left') # Show more information on the \`direction\` side.
215+
wait() #Sleep for 5s and take a screenshot to check for any changes.
216+
finished(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format.
217+
218+
## Note
219+
- Use Chinese in \`Thought\` part.
220+
- Write a small plan and finally summarize your next action (with its target element) in one sentence in \`Thought\` part.
221+
222+
## User Instruction
223+
`;
224+
```
225+
226+
### Action Space Details
227+
- **navigate(url)**: Navigate to specified URL
228+
- **navigate_back()**: Go back to previous page
229+
- **click(point)**: Single left click at coordinates
230+
- **left_double(point)**: Double left click at coordinates
231+
- **right_single(point)**: Single right click at coordinates
232+
- **drag(start_point, end_point)**: Drag from start to end coordinates
233+
- **hotkey(key)**: Execute keyboard shortcuts (max 3 keys, space-separated, lowercase)
234+
- **type(content)**: Type text content (use escape characters for special chars)
235+
- **scroll(point, direction)**: Scroll in specified direction at coordinates
236+
- **wait()**: Wait 5 seconds and take screenshot
237+
- **finished(content)**: Mark task completion with result content
238+
239+
### Content Formatting Rules
240+
- **Escape Characters**: Use `\'`, `\"`, `\n` for special characters
241+
- **Line Breaks**: Use `\n` for new lines
242+
- **Submission**: End with `\n` to submit input
243+
- **Action Summary**: One sentence summary of next action in Thought section
244+
245+
## DEVELOPMENT WORKFLOW
246+
1. **Setup**: Copy `.env.local.example` to `.env.local`
247+
2. **Install**: `npm install`
248+
3. **Build**: `npm run build` (MANDATORY)
249+
4. **Run**: `npm start`
250+
5. **Development**: `npm run dev` (skips build)
251+
6. **Clean**: `npm run clean` (removes dist/)
252+
253+
## AGENT EXECUTION FLOW
254+
1. **Environment Loading**: Load variables from `.env.local` using dotenv
255+
2. **Model Configuration**: Initialize with proper type assertions (`as const`)
256+
3. **Operator Setup**: Create AIO operator with sandbox URL and timeout
257+
4. **Agent Initialization**: Combine operator, model, and system prompt
258+
5. **Task Execution**: Run agent with structured input format
259+
6. **Response Processing**: Handle and display agent output
260+
261+
### Detailed Execution Steps
262+
```typescript
263+
// 1. Environment Loading
264+
dotenv.config({ path: path.join(__dirname, '..', '.env.local') });
265+
266+
// 2. Model Configuration with Type Safety
267+
const doubao = {
268+
id: process.env.DOUBAO_SEED_1_6!,
269+
provider: 'volcengine' as const, // Critical: literal type
270+
baseURL: process.env.ARK_BASE_URL!,
271+
apiKey: process.env.ARK_API_KEY!,
272+
};
273+
274+
// 3. Operator Setup
275+
const operator = new AIOHybridOperator({
276+
baseURL: process.env.SANDBOX_URL!,
277+
timeout: 10000, // 10 second timeout
278+
});
279+
280+
// 4. Agent Initialization
281+
const guiAgent = new GUIAgent({
282+
operator,
283+
model: doubao,
284+
systemPrompt: SYSTEM_PROMPT,
285+
});
286+
287+
// 5. Task Execution
288+
const response = await guiAgent.run({
289+
input: [{ type: 'text', text: 'your task description' }],
290+
});
291+
292+
// 6. Response Processing
293+
console.log('Agent Response:', response.content);
294+
```
295+
296+
### Input Format Requirements
297+
- **Structure**: Array of input objects
298+
- **Type**: Must be `'text'` for text inputs
299+
- **Content**: Task description in natural language
300+
- **Example**: `[{ type: 'text', text: '打开百度搜索页面并搜索TypeScript教程' }]`
301+
302+
### Response Format
303+
- **Type**: Object with content property
304+
- **Content**: Agent's response including thoughts and actions
305+
- **Format**: Follows SYSTEM_PROMPT output format (Thought + Action)
306+
- **Language**: Chinese thoughts, English actions
307+
308+
## CRITICAL SUCCESS FACTORS
309+
- **Always build before running production**: `npm run build` is mandatory
310+
- **Use `as const` for string literals requiring specific types**: Prevents type widening
311+
- **Maintain CommonJS module system**: Never add `"type": "module"` to package.json
312+
- **Ensure all environment variables are set**: Missing vars cause runtime failures
313+
- **Never import type-only dependencies in runtime code**: Causes compilation errors
314+
- **Verify sandbox connectivity**: SANDBOX_URL must be accessible and responsive
315+
- **Use correct model endpoint IDs**: Invalid IDs cause API authentication failures
316+
- **Follow point coordinate format**: `<point>x y</point>` format is strictly required
317+
- **Escape special characters in content**: Use `\'`, `\"`, `\n` for proper parsing
318+
- **Chinese thoughts, English actions**: Language requirements are enforced by system prompt
319+
320+
## TROUBLESHOOTING GUIDE
321+
322+
### Build Issues
323+
- **Error**: `Cannot find module './constants'`
324+
- **Cause**: Missing build step or incorrect module resolution
325+
- **Solution**: Run `npm run build` and verify tsconfig.json settings
326+
327+
- **Error**: `Cannot assign string to ModelProviderName`
328+
- **Cause**: Missing `as const` assertion on provider field
329+
- **Solution**: Add `as const` to provider: `provider: 'volcengine' as const`
330+
331+
### Runtime Issues
332+
- **Error**: `ARK_API_KEY is not defined`
333+
- **Cause**: Missing or incorrect .env.local file
334+
- **Solution**: Copy .env.local.example and fill actual values
335+
336+
- **Error**: `Connection refused to SANDBOX_URL`
337+
- **Cause**: AIO sandbox not running or incorrect URL
338+
- **Solution**: Verify sandbox is accessible at specified URL
339+
340+
- **Error**: `Model endpoint not found`
341+
- **Cause**: Invalid DOUBAO_SEED_1_6 endpoint ID
342+
- **Solution**: Check Volcengine console for correct endpoint ID
343+
344+
### Agent Execution Issues
345+
- **Error**: `Invalid action format`
346+
- **Cause**: Incorrect point coordinates or action syntax
347+
- **Solution**: Follow `<point>x y</point>` format and action space definitions
348+
349+
- **Error**: `Timeout waiting for response`
350+
- **Cause**: Network issues or model overload
351+
- **Solution**: Increase timeout in operator configuration or retry
352+
353+
## PERFORMANCE OPTIMIZATION
354+
- **Build Time**: Use `npm run dev` for development (skips build)
355+
- **Model Response**: Adjust timeout based on task complexity
356+
- **Memory Usage**: Clean dist/ folder regularly with `npm run clean`
357+
- **Network**: Ensure stable connection to ARK and sandbox services

0 commit comments

Comments
 (0)