# GUI-Agent Architecture and Workflow ## System Overview ### Core Components - Controller: Central controller responsible for state management and decision triggering - Manager: Task planner responsible for task decomposition and re-planning - Worker: Executor with three specialized roles: - Technician: Uses system terminal to complete tasks - Operator: Executes GUI interface operations - Analyst: Provides analytical support - Evaluator: Quality inspector responsible for execution effectiveness evaluation - Hardware: Hardware interface responsible for actual operation execution ### Global State Definitions ```python { "TaskStatus": ["created", "pending", "on_hold", "fulfilled", "rejected"], "SubtaskStatus": ["ready", "pending", "fulfilled", "rejected"], "ExecStatus": ["executed", "timeout", "error", "pending"], "GateDecision": ["gate_done", "gate_fail", "gate_supplement", "gate_continue"], "GateTrigger": ["PERIODIC_CHECK", "WORKER_STALE", "WORKER_SUCCESS", "FINAL_CHECK"], "controller_situation": ["INIT", "GET_ACTION", "EXECUTE_ACTION", "QUALITY_CHECK", "PLAN", "SUPPLEMENT", "FINAL_CHECK", "DONE"], } ``` #### State Descriptions: - TaskStatus: Overall task status - SubtaskStatus: Subtask status - ExecStatus: Command execution status - GateDecision: Quality check decision result - GateTrigger: Quality check trigger condition - controller_situation: Controller situation status ## System Startup and Initialization ### Startup Check ``` Initialize system state TaskStatus = pending Check task status: If TaskStatus = fulfilled or TaskStatus = rejected Enter end state Otherwise enter core scheduling loop ``` ## Core Scheduling Loop ### State Flow Description - GET_ACTION: Generate specific operation instructions ``` Executing Component: Worker (Technician/Operator/Analyst) GET_ACTION → Worker execution → Result judgment ├── success → current_situation = QUALITY_CHECK ├── CANNOT_EXECUTE → current_situation = REPLAN ├── STALE_PROGRESS → current_situation = QUALITY_CHECK └── generate_action → current_situation = EXECUTE_ACTION └── supplement → current_situation = SUPPLEMENT ``` - EXECUTE_ACTION: Execute specific operations ``` Executing Component: Hardware SEND_ACTION → Hardware execution → Get screenshot → Update history → current_situation = GET_ACTION ``` - QUALITY_CHECK: Quality assessment of execution effectiveness ``` Executing Component: Evaluator Core Functions: Visual comparison, progress analysis, efficiency evaluation QUALITY_CHECK → Evaluator assessment → GateDecision judgment ├── gate_done → Check subtask status │ ├── More subtasks exist → Switch to next subtask → current_situation = GET_ACTION │ └── No more subtasks → current_situation=FINAL_CHECK ├── gate_fail → current_situation = PLAN ├── gate_continue → current_situation = EXECUTE_ACTION └── gate_supplement → current_situation = SUPPLEMENT ``` - PLAN: Re-plan tasks ``` Executing Component: Manager PLAN → Manager re-planning → Generate new subtasks → Assign Workers → current_situation = GET_ACTION ``` - SUPPLEMENT: Supplement external materials ``` Executing Component: Manager SUPPLEMENT → Manager calls external tools → Generate supplementary materials → Record materials → current_situation = PLAN External Tools: web search, RAG, etc. ``` - FINAL_CHECK: Final verification of task completion status ``` Executing Component: Evaluator Trigger Condition: Final verification after all subtasks are marked as complete FINAL_CHECK → Evaluator final assessment → Result judgment ├── Verification passed → TaskStatus = fulfilled → System ends ├── Issues found → current_situation = PLAN → Continue execution Verification Content: Whether overall objectives are achieved Whether all necessary steps are completed Whether final state meets expectations Whether there are omissions or errors ``` ## Worker Professional Division ### Technician - Applicable Scenarios: Tasks requiring system-level operations - Working Method: Complete tasks through terminal commands via backend service execution, can write code in ```bash...``` code blocks for bash scripts, and ```python...``` code blocks for python code. - Typical Tasks: - File system operations - System configuration modifications - Program installation and deployment - Script execution ### Operator - Applicable Scenarios: Tasks requiring GUI interface interaction or inner operations such as memrorization - Working Method: Simulate user interface operations - Typical Tasks: - Clicking buttons, menus - Filling forms - Drag and drop operations - Window management ### Analyst - Applicable Scenarios: Tasks requiring data analysis and decision support - Working Method: Analyze memory stored inside the system, provide recommendations - Typical Tasks: - Question analysis ## Monitoring and Trigger Mechanisms ### Quality Check Trigger Mechanism GateTrigger Types: ``` PERIODIC_CHECK: Periodic check Regular verification of execution progress WORKER_STALE: Worker stagnation check Worker reports task cannot goingon WORKER_SUCCESS: Worker successful completion Worker reports task completion Need to verify completion quality ``` ### Task Termination Conditions ``` TaskStatus = rejected conditions: Manager planning attempts > 10 times current_step > N steps (timeout termination) TaskStatus = fulfilled conditions: All subtask status = fulfilled FINAL_CHECK verification passed Expected target state achieved ``` ### ExecStatus Handling ``` executed: Normal execution completion → Continue process timeout: Execution timeout → Retry or re-plan error: Execution error → Error handling, may need re-planning pending: Currently executing ``` ## State Monitoring Mechanism ### SubtaskStatus Management ``` ready: Ready for execution, waiting pending: Currently executing fulfilled: Successfully completed rejected: Execution failed ``` ### State Transition Monitoring ``` System continuously monitors state changes at all levels: TaskStatus changes trigger global process adjustments SubtaskStatus changes affect current execution strategy ExecStatus changes determine immediate response measures All state changes are recorded in execution history ```