Agent System

Overview

Todo

need a figure here

To support the development and evaluation of LLM/VLM-driven agents, SimWorld offers a robust and scalable Agent System. Specifically, SimWorld defines two categories of agents:

Rule-based agents: Serve as background entities (e.g., vehicles and pedestrians) to create a more realistic environment. See the Traffic System for details.
LLM/VLM-driven agents: Designed for researchers exploring AI agent behavior, multi-agent interactions, NLP tasks, and embodied AI research.

Base Agent

SimWorld provides a foundational class BaseAgent for all types of agents. This class encapsulates the agent’s physical state in Unreal Engine, including its position and direction (yaw). Users can create custom agents by inheriting from BaseAgent.

class BaseAgent:
    def __init__(self, position: Vector, direction: Vector):
        """Initialize the base agent.

        Args:
            position: Initial position vector.
            direction: Initial direction vector.
        """
        self._position = position
        self._direction = direction
        self._yaw = 0

Related files: base_agent.py.

Local Planner

To accommodate diverse research focuses—ranging from text-based LLM agents to vision-based VLM agents—SimWorld introduces a flexible and modular Local Planner to bridge high-level reasoning with low-level execution. The core functionality of the Local Planner lies in its ability to decompose abstract plans into concrete, executable actions, enabling seamless integration between language, vision, and simulation.

The Local Planner consists of two main components: the parser and the executor.

The parser takes a high-level plan described in natural language and breaks it down into a sequence of low-level actions. This plan can originate either from human input or from upstream LLM/VLM modules.
The executor then interprets and performs these low-level actions within the simulated environment. For non-atomic tasks such as navigation, the executor supports two operational modes:
- Rule-based mode: Agents follow a predefined route generated by the A* algorithm.
- Vision-based mode: Agents rely solely on visual inputs and the goal destination, making decisions using a VLM-driven policy.

The Local Planner is designed with extensibility and modularity in mind. Users can plug in their own LLMs/VLMs via API calls, and customize either component independently. This decoupling allows researchers to focus on specific layers of cognition or perception without needing to manage the full control pipeline.

In summary, the Local Planner serves as a crucial abstraction layer that decouples high-level social reasoning from low-level physical execution, empowering users to design and experiment with custom agent architectures (e.g., observation models, memory systems, or reasoning engines) tailored to their research needs.

Related files: local_planner.py.

Base LLM

SimWorld provides a BaseLLM class as a foundational interface for LLMs into the framework. It is designed to be extensible and robust, with automatic retry mechanisms built into all public methods to improve reliability when interacting with external APIs.

class BaseLLM(metaclass=LLMMetaclass):
    def __init__(
        self,
        model_name: str,
        url: Optional[str] = None,
        provider: Optional[str] = 'openai'
    ):
    ...

This class serves as the base class for custom LLM implementations, supporting common providers such as OpenAI by default. Developers can extend it to integrate models hosted locally or through other third-party services.

Note

Currently SimWorld only supports OpenAI and OpenRouter API calls.

Related files: base_llm.py.

Action Space

SimWorld defines a two-tiered action space to support hierarchical planning and execution: HighLevelAction and LowLevelAction.

High-level actions represent abstract tasks and are intended to be parsed from natural language inputs by the parser module in the Local Planner.
Low-level actions particularly correspond to navigation-related actions.

See Actions to get a full list of supported actions.

class HighLevelAction(Enum):
    """High-level actions that an agent can perform."""
    DO_NOTHING = 0
    NAVIGATE = 1
    PICK_UP = 2
    ...

class LowLevelAction(Enum):
    """Low-level actions that an agent can perform."""
    DO_NOTHING = 0
    STEP_FORWARD = 1
    TURN_AROUND = 2

This modular design encourages extensibility and users are welcome to define custom actions to suit task-specific needs.

Related files: action_space.py.

Using Local Planner

The Local Planner should be used with an agent.

# Initialize the local planner
local_planner = LocalPlanner(agent=humanoid, model=llm, rule_based=False)

# Parse high level plan
plan = 'Go to (1700, -1700) and pick up GEN_BP_Bottle_1_C.'
action_space = local_planner.parse(plan)

# Execution
local_planner.execute(action_space)

A complete example can be found in scripts/local_planner_test.ipynb.