Skip to content

Conversation

@ZhaoHeh
Copy link
Collaborator

@ZhaoHeh ZhaoHeh commented Sep 22, 2025

Summary

This PR introduces significant enhancements to the GUI Agent SDK, focusing on

  • action parsing capabilities;
  • coordinate normalization;
  • comprehensive utility functions
    The changes improve the robustness, flexibility, and maintainability of the GUI agent system.

Key Features

1. JSON Function Call Parser Support

  • Added JSON function call parsing capabilities in ActionParserHelper.ts
  • Introduced parseFunctionCallString() and parseRoughFromFunctionCall() methods
  • Enhanced action parsing with better error handling and validation
  • Improved compatibility with different action input formats

2. Coordinate Normalization System

  • New utility: coordinateNormalizer.ts with defaultNormalizeCoords() function
  • Core function: normalizeActionCoords() for processing BaseAction coordinates
  • Supports normalization of point, start, and end coordinate fields

3. Action Standardization Framework

  • New utility: standardizeNames.ts with comprehensive action type mapping
  • Standardizes 50+ action type variations (click, double_click, scroll, etc.)
  • Normalizes action input parameter names across different formats
  • Action-specific input mappings for better consistency

4. Enhanced Utility Functions

  • serializeActions.ts: Action serialization utilities
  • systemPromptProcessor.ts: System prompt processing capabilities
  • sleep.ts: Async sleep utility for timing operations
  • Improved base agent and operator classes with better initialization

📝 Related Issues

This PR consolidates multiple feature enhancements and refactoring efforts:

Checklist

  • Added or updated necessary tests (Optional).
  • Updated documentation to align with changes (Optional).
  • Verified no breaking changes, or prepared solutions for any occurring breaking changes (Optional).
  • My change does not involve the above items.

@netlify
Copy link

netlify bot commented Sep 22, 2025

Deploy Preview for agent-tars-docs ready!

Name Link
🔨 Latest commit 035293a
🔍 Latest deploy log https://app.netlify.com/projects/agent-tars-docs/deploys/68d11b9f3220040008c2f2c3
😎 Deploy Preview https://deploy-preview-1617--agent-tars-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Sep 22, 2025

Deploy Preview for tarko ready!

Name Link
🔨 Latest commit 035293a
🔍 Latest deploy log https://app.netlify.com/projects/tarko/deploys/68d11b9dc5fb4f00088cd3bd
😎 Deploy Preview https://deploy-preview-1617--tarko.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@ZhaoHeh ZhaoHeh merged commit 9f6930f into main Sep 23, 2025
10 checks passed
@ZhaoHeh ZhaoHeh deleted the refact/gui_agent_sdk_917_v2 branch September 23, 2025 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants