|
| 1 | +# GUI Agent Standalone Project - Quick Start for LLM Agents |
| 2 | + |
| 3 | +## PROJECT OVERVIEW |
| 4 | +- **Project Type**: TypeScript Node.js application using GUI Agent SDK |
| 5 | +- **Module System**: CommonJS (NOT ES Modules) |
| 6 | +- **Build Tool**: TypeScript Compiler (tsc) |
| 7 | +- **Package Manager**: npm/pnpm |
| 8 | +- **Runtime**: Node.js v22.17.1+ |
| 9 | + |
| 10 | +## CRITICAL TECHNICAL CONSTRAINTS |
| 11 | + |
| 12 | +### 1. MODULE SYSTEM REQUIREMENTS |
| 13 | +- **MUST USE CommonJS**: `package.json` does NOT contain `"type": "module"` |
| 14 | +- **TypeScript Config**: `"module": "CommonJS"` in `tsconfig.json` |
| 15 | +- **Import Syntax**: Use ES6 imports in TypeScript, compiles to CommonJS require() |
| 16 | +- **File Extensions**: NO `.js` extensions needed in TypeScript imports |
| 17 | +- **__dirname**: Available in CommonJS (NOT available in ES modules) |
| 18 | + |
| 19 | +### 2. TYPE SYSTEM CONSTRAINTS |
| 20 | +- **Avoid Type Imports**: Do NOT import `ModelProviderName` or `AgentModel` types |
| 21 | +- **Use `as const` Assertion**: For string literals that need specific types |
| 22 | +- **Example**: `provider: 'volcengine' as const` (NOT `provider: 'volcengine'`) |
| 23 | +- **Reason**: Prevents string type widening, ensures literal type compatibility |
| 24 | + |
| 25 | +### 3. ENVIRONMENT CONFIGURATION |
| 26 | +- **File**: `.env.local` (copy from `.env.local.example`) |
| 27 | +- **Loading**: Uses `dotenv` with `path.join(__dirname, '..', '.env.local')` |
| 28 | + |
| 29 | +#### Required Environment Variables |
| 30 | +```bash |
| 31 | +# Model Service Configuration |
| 32 | +ARK_BASE_URL=https://your-model-service-url # Model Service API endpoint |
| 33 | +ARK_API_KEY=your-actual-model-service-api-key # Your Model Service API key |
| 34 | + |
| 35 | +# Doubao Models Configuration |
| 36 | +DOUBAO_1_5_VP=your-model-key-abcdef # Doubao 1.5 VP model endpoint ID |
| 37 | +DOUBAO_SEED_1_6=your-model-key-fedcba # Doubao Seed 1.6 model endpoint ID |
| 38 | + |
| 39 | +# AIO Sandbox Configuration |
| 40 | +SANDBOX_URL=http://your-sandbox-url:port # AIO operator sandbox URL |
| 41 | +``` |
| 42 | + |
| 43 | +#### Environment Variable Details |
| 44 | +- **ARK_BASE_URL**: Model service base URL |
| 45 | +- **ARK_API_KEY**: Your Volcengine account API key for authentication |
| 46 | +- **DOUBAO_SEED_1_6**: Model endpoint ID for Doubao Seed 1.6, format: `ep-{timestamp}-{hash}` |
| 47 | +- **DOUBAO_1_5_VP**: Model endpoint ID for Doubao 1.5 VP (optional, not used in current code) |
| 48 | +- **SANDBOX_URL**: URL of the AIO operator sandbox environment for GUI operations |
| 49 | + |
| 50 | +#### Environment Setup Process |
| 51 | +1. Copy `.env.local.example` to `.env.local` |
| 52 | +2. Replace all `your-*` placeholders with actual values |
| 53 | +3. Ensure ARK_API_KEY has proper permissions for model access |
| 54 | +4. Verify SANDBOX_URL is accessible and running |
| 55 | +5. Model endpoint IDs must be valid and active in your Volcengine account |
| 56 | + |
| 57 | +## PROJECT STRUCTURE |
| 58 | +``` |
| 59 | +gui-agent-standalone/ |
| 60 | +├── src/ |
| 61 | +│ ├── index.ts # Main entry point |
| 62 | +│ └── constants.ts # System prompt definition |
| 63 | +├── dist/ # Compiled JavaScript output |
| 64 | +├── package.json # Dependencies and scripts |
| 65 | +├── tsconfig.json # TypeScript configuration |
| 66 | +├── .env.local.example # Environment template |
| 67 | +└── .env.local # Actual environment (create manually) |
| 68 | +``` |
| 69 | + |
| 70 | +## DEPENDENCIES |
| 71 | +### Runtime Dependencies |
| 72 | +- `dotenv`: Environment variable loading |
| 73 | +- `@gui-agent/agent-sdk`: Core GUI agent functionality |
| 74 | +- `@gui-agent/operator-aio`: AIO hybrid operator |
| 75 | +- `@gui-agent/action-parser`: Action parsing utilities |
| 76 | + |
| 77 | +### Development Dependencies |
| 78 | +- `typescript`: TypeScript compiler |
| 79 | +- `tsx`: TypeScript execution for development |
| 80 | +- `@types/node`: Node.js type definitions |
| 81 | + |
| 82 | +## BUILD AND RUN PROCESS |
| 83 | + |
| 84 | +### 1. MANDATORY BUILD STEP |
| 85 | +```bash |
| 86 | +npm run build # Compiles TypeScript to dist/ |
| 87 | +``` |
| 88 | +- **NEVER skip this step** - project requires compilation |
| 89 | +- **Output**: `dist/index.js` and `dist/constants.js` |
| 90 | + |
| 91 | +### 2. EXECUTION |
| 92 | +```bash |
| 93 | +npm start # Runs compiled JavaScript |
| 94 | +# OR for development |
| 95 | +npm run dev # Direct TypeScript execution |
| 96 | +``` |
| 97 | + |
| 98 | +## CODE STRUCTURE PATTERNS |
| 99 | + |
| 100 | +### 1. Model Configuration |
| 101 | +```typescript |
| 102 | +const doubao = { |
| 103 | + id: process.env.DOUBAO_SEED_1_6!, |
| 104 | + provider: 'volcengine' as const, // CRITICAL: as const assertion |
| 105 | + baseURL: process.env.ARK_BASE_URL!, |
| 106 | + apiKey: process.env.ARK_API_KEY!, |
| 107 | +}; |
| 108 | +``` |
| 109 | + |
| 110 | +### 2. Operator Setup |
| 111 | +```typescript |
| 112 | +const operator = new AIOHybridOperator({ |
| 113 | + baseURL: process.env.SANDBOX_URL!, |
| 114 | + timeout: 10000, |
| 115 | +}); |
| 116 | +``` |
| 117 | + |
| 118 | +### 3. Agent Initialization |
| 119 | +```typescript |
| 120 | +const guiAgent = new GUIAgent({ |
| 121 | + operator, |
| 122 | + model: doubao, // No type assertion needed with as const |
| 123 | + systemPrompt: SYSTEM_PROMPT, |
| 124 | +}); |
| 125 | +``` |
| 126 | + |
| 127 | +### 4. Execution Pattern |
| 128 | +```typescript |
| 129 | +async function main() { |
| 130 | + const response = await guiAgent.run({ |
| 131 | + input: [{ type: 'text', text: 'your-task-here' }], |
| 132 | + }); |
| 133 | + console.log(response.content); |
| 134 | +} |
| 135 | +``` |
| 136 | + |
| 137 | +## COMMON ISSUES AND SOLUTIONS |
| 138 | + |
| 139 | +### Issue 1: ERR_MODULE_NOT_FOUND |
| 140 | +- **Cause**: Missing build step or ES module configuration |
| 141 | +- **Solution**: Run `npm run build` first, ensure CommonJS config |
| 142 | + |
| 143 | +### Issue 2: Type Assignment Error |
| 144 | +- **Error**: Cannot assign string to ModelProviderName |
| 145 | +- **Solution**: Use `as const` assertion on provider field |
| 146 | +- **Wrong**: `provider: 'volcengine'` |
| 147 | +- **Correct**: `provider: 'volcengine' as const` |
| 148 | + |
| 149 | +### Issue 3: __dirname Not Defined |
| 150 | +- **Cause**: ES module configuration |
| 151 | +- **Solution**: Remove `"type": "module"` from package.json |
| 152 | + |
| 153 | +### Issue 4: Environment Variables Not Loaded |
| 154 | +- **Cause**: Missing .env.local file |
| 155 | +- **Solution**: Copy .env.local.example to .env.local and fill values |
| 156 | + |
| 157 | +## TYPESCRIPT CONFIGURATION DETAILS |
| 158 | + |
| 159 | +### tsconfig.json Requirements |
| 160 | +```json |
| 161 | +{ |
| 162 | + "compilerOptions": { |
| 163 | + "target": "ES2022", |
| 164 | + "module": "CommonJS", // CRITICAL: Must be CommonJS |
| 165 | + "moduleResolution": "node", // CRITICAL: Must be node |
| 166 | + "outDir": "./dist", |
| 167 | + "rootDir": "./src", |
| 168 | + "esModuleInterop": true, |
| 169 | + "strict": true |
| 170 | + } |
| 171 | +} |
| 172 | +``` |
| 173 | + |
| 174 | +### package.json Requirements |
| 175 | +```json |
| 176 | +{ |
| 177 | + "main": "dist/index.js", |
| 178 | + // NO "type": "module" field |
| 179 | + "scripts": { |
| 180 | + "build": "tsc", |
| 181 | + "start": "node dist/index.js", |
| 182 | + "dev": "tsx src/index.ts", |
| 183 | + "clean": "rm -rf dist" |
| 184 | + } |
| 185 | +} |
| 186 | +``` |
| 187 | + |
| 188 | +## SYSTEM PROMPT CONFIGURATION |
| 189 | +- **Location**: `src/constants.ts` |
| 190 | +- **Export**: `SYSTEM_PROMPT` constant |
| 191 | +- **Content**: GUI agent instructions with action space definitions |
| 192 | + |
| 193 | +### Complete SYSTEM_PROMPT Content |
| 194 | +```typescript |
| 195 | +export const SYSTEM_PROMPT = ` |
| 196 | +You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. |
| 197 | +
|
| 198 | +## Output Format |
| 199 | +\`\`\` |
| 200 | +Thought: ... |
| 201 | +Action: ... |
| 202 | +\`\`\` |
| 203 | +
|
| 204 | +## Action Space |
| 205 | +
|
| 206 | +navigate(url='xxx') # The url to navigate to |
| 207 | +navigate_back() # Navigate back to the previous page. |
| 208 | +click(point='<point>x1 y1</point>') |
| 209 | +left_double(point='<point>x1 y1</point>') |
| 210 | +right_single(point='<point>x1 y1</point>') |
| 211 | +drag(start_point='<point>x1 y1</point>', end_point='<point>x2 y2</point>') |
| 212 | +hotkey(key='ctrl c') # Split keys with a space and use lowercase. Also, do not use more than 3 keys in one hotkey action. |
| 213 | +type(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format. If you want to submit your input, use \\n at the end of content. |
| 214 | +scroll(point='<point>x1 y1</point>', direction='down or up or right or left') # Show more information on the \`direction\` side. |
| 215 | +wait() #Sleep for 5s and take a screenshot to check for any changes. |
| 216 | +finished(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format. |
| 217 | +
|
| 218 | +## Note |
| 219 | +- Use Chinese in \`Thought\` part. |
| 220 | +- Write a small plan and finally summarize your next action (with its target element) in one sentence in \`Thought\` part. |
| 221 | +
|
| 222 | +## User Instruction |
| 223 | +`; |
| 224 | +``` |
| 225 | + |
| 226 | +### Action Space Details |
| 227 | +- **navigate(url)**: Navigate to specified URL |
| 228 | +- **navigate_back()**: Go back to previous page |
| 229 | +- **click(point)**: Single left click at coordinates |
| 230 | +- **left_double(point)**: Double left click at coordinates |
| 231 | +- **right_single(point)**: Single right click at coordinates |
| 232 | +- **drag(start_point, end_point)**: Drag from start to end coordinates |
| 233 | +- **hotkey(key)**: Execute keyboard shortcuts (max 3 keys, space-separated, lowercase) |
| 234 | +- **type(content)**: Type text content (use escape characters for special chars) |
| 235 | +- **scroll(point, direction)**: Scroll in specified direction at coordinates |
| 236 | +- **wait()**: Wait 5 seconds and take screenshot |
| 237 | +- **finished(content)**: Mark task completion with result content |
| 238 | + |
| 239 | +### Content Formatting Rules |
| 240 | +- **Escape Characters**: Use `\'`, `\"`, `\n` for special characters |
| 241 | +- **Line Breaks**: Use `\n` for new lines |
| 242 | +- **Submission**: End with `\n` to submit input |
| 243 | +- **Action Summary**: One sentence summary of next action in Thought section |
| 244 | + |
| 245 | +## DEVELOPMENT WORKFLOW |
| 246 | +1. **Setup**: Copy `.env.local.example` to `.env.local` |
| 247 | +2. **Install**: `npm install` |
| 248 | +3. **Build**: `npm run build` (MANDATORY) |
| 249 | +4. **Run**: `npm start` |
| 250 | +5. **Development**: `npm run dev` (skips build) |
| 251 | +6. **Clean**: `npm run clean` (removes dist/) |
| 252 | + |
| 253 | +## AGENT EXECUTION FLOW |
| 254 | +1. **Environment Loading**: Load variables from `.env.local` using dotenv |
| 255 | +2. **Model Configuration**: Initialize with proper type assertions (`as const`) |
| 256 | +3. **Operator Setup**: Create AIO operator with sandbox URL and timeout |
| 257 | +4. **Agent Initialization**: Combine operator, model, and system prompt |
| 258 | +5. **Task Execution**: Run agent with structured input format |
| 259 | +6. **Response Processing**: Handle and display agent output |
| 260 | + |
| 261 | +### Detailed Execution Steps |
| 262 | +```typescript |
| 263 | +// 1. Environment Loading |
| 264 | +dotenv.config({ path: path.join(__dirname, '..', '.env.local') }); |
| 265 | + |
| 266 | +// 2. Model Configuration with Type Safety |
| 267 | +const doubao = { |
| 268 | + id: process.env.DOUBAO_SEED_1_6!, |
| 269 | + provider: 'volcengine' as const, // Critical: literal type |
| 270 | + baseURL: process.env.ARK_BASE_URL!, |
| 271 | + apiKey: process.env.ARK_API_KEY!, |
| 272 | +}; |
| 273 | + |
| 274 | +// 3. Operator Setup |
| 275 | +const operator = new AIOHybridOperator({ |
| 276 | + baseURL: process.env.SANDBOX_URL!, |
| 277 | + timeout: 10000, // 10 second timeout |
| 278 | +}); |
| 279 | + |
| 280 | +// 4. Agent Initialization |
| 281 | +const guiAgent = new GUIAgent({ |
| 282 | + operator, |
| 283 | + model: doubao, |
| 284 | + systemPrompt: SYSTEM_PROMPT, |
| 285 | +}); |
| 286 | + |
| 287 | +// 5. Task Execution |
| 288 | +const response = await guiAgent.run({ |
| 289 | + input: [{ type: 'text', text: 'your task description' }], |
| 290 | +}); |
| 291 | + |
| 292 | +// 6. Response Processing |
| 293 | +console.log('Agent Response:', response.content); |
| 294 | +``` |
| 295 | + |
| 296 | +### Input Format Requirements |
| 297 | +- **Structure**: Array of input objects |
| 298 | +- **Type**: Must be `'text'` for text inputs |
| 299 | +- **Content**: Task description in natural language |
| 300 | +- **Example**: `[{ type: 'text', text: '打开百度搜索页面并搜索TypeScript教程' }]` |
| 301 | + |
| 302 | +### Response Format |
| 303 | +- **Type**: Object with content property |
| 304 | +- **Content**: Agent's response including thoughts and actions |
| 305 | +- **Format**: Follows SYSTEM_PROMPT output format (Thought + Action) |
| 306 | +- **Language**: Chinese thoughts, English actions |
| 307 | + |
| 308 | +## CRITICAL SUCCESS FACTORS |
| 309 | +- **Always build before running production**: `npm run build` is mandatory |
| 310 | +- **Use `as const` for string literals requiring specific types**: Prevents type widening |
| 311 | +- **Maintain CommonJS module system**: Never add `"type": "module"` to package.json |
| 312 | +- **Ensure all environment variables are set**: Missing vars cause runtime failures |
| 313 | +- **Never import type-only dependencies in runtime code**: Causes compilation errors |
| 314 | +- **Verify sandbox connectivity**: SANDBOX_URL must be accessible and responsive |
| 315 | +- **Use correct model endpoint IDs**: Invalid IDs cause API authentication failures |
| 316 | +- **Follow point coordinate format**: `<point>x y</point>` format is strictly required |
| 317 | +- **Escape special characters in content**: Use `\'`, `\"`, `\n` for proper parsing |
| 318 | +- **Chinese thoughts, English actions**: Language requirements are enforced by system prompt |
| 319 | + |
| 320 | +## TROUBLESHOOTING GUIDE |
| 321 | + |
| 322 | +### Build Issues |
| 323 | +- **Error**: `Cannot find module './constants'` |
| 324 | + - **Cause**: Missing build step or incorrect module resolution |
| 325 | + - **Solution**: Run `npm run build` and verify tsconfig.json settings |
| 326 | + |
| 327 | +- **Error**: `Cannot assign string to ModelProviderName` |
| 328 | + - **Cause**: Missing `as const` assertion on provider field |
| 329 | + - **Solution**: Add `as const` to provider: `provider: 'volcengine' as const` |
| 330 | + |
| 331 | +### Runtime Issues |
| 332 | +- **Error**: `ARK_API_KEY is not defined` |
| 333 | + - **Cause**: Missing or incorrect .env.local file |
| 334 | + - **Solution**: Copy .env.local.example and fill actual values |
| 335 | + |
| 336 | +- **Error**: `Connection refused to SANDBOX_URL` |
| 337 | + - **Cause**: AIO sandbox not running or incorrect URL |
| 338 | + - **Solution**: Verify sandbox is accessible at specified URL |
| 339 | + |
| 340 | +- **Error**: `Model endpoint not found` |
| 341 | + - **Cause**: Invalid DOUBAO_SEED_1_6 endpoint ID |
| 342 | + - **Solution**: Check Volcengine console for correct endpoint ID |
| 343 | + |
| 344 | +### Agent Execution Issues |
| 345 | +- **Error**: `Invalid action format` |
| 346 | + - **Cause**: Incorrect point coordinates or action syntax |
| 347 | + - **Solution**: Follow `<point>x y</point>` format and action space definitions |
| 348 | + |
| 349 | +- **Error**: `Timeout waiting for response` |
| 350 | + - **Cause**: Network issues or model overload |
| 351 | + - **Solution**: Increase timeout in operator configuration or retry |
| 352 | + |
| 353 | +## PERFORMANCE OPTIMIZATION |
| 354 | +- **Build Time**: Use `npm run dev` for development (skips build) |
| 355 | +- **Model Response**: Adjust timeout based on task complexity |
| 356 | +- **Memory Usage**: Clean dist/ folder regularly with `npm run clean` |
| 357 | +- **Network**: Ensure stable connection to ARK and sandbox services |
0 commit comments