Reading OpenHands Source Code

Why OpenHands

OpenHands is one of the most mature open-source software engineering agent frameworks. Unlike most agent demos, it actually runs code, browses the web, and handles multi-step tasks. Reading its source is a good way to understand how production-grade agents are structured.

This is a live notes document — I’ll update it as I dig deeper.

Top-level structure

openhands/
  core/          # Agent loop, schema, config
  controller/    # State machine for agent execution
  runtime/       # Sandboxed execution environments
  llm/           # LLM provider abstraction
  events/        # Event stream (the core communication bus)
  agenthub/      # Concrete agent implementations

The key architectural insight: everything is an event. Agents don’t directly call tools. Instead, they emit action events (like CmdRunAction or BrowseURLAction). A runtime consumes those events, executes them, and emits observation events back. The agent loop reads observations and decides the next action.

The event stream

openhands/events/stream.py defines EventStream, the central bus. Every action and observation flows through here. This makes replay, logging, and debugging straightforward — the full history of any agent session is just an ordered list of events.

The event types are defined in openhands/events/action/ and openhands/events/observation/. Worth reading both directories to understand what agents can do and what information they can receive.

The agent loop

openhands/controller/agent_controller.py manages the core loop:

Get current state (including event history)
Call agent.step(state) to get the next action
Execute the action via the runtime
Receive an observation
Add both to the event stream
Repeat until done or max iterations reached

The simplicity here is deliberate. Most of the intelligence lives in the agent implementations (agenthub/), not in the controller.

CodeActAgent

The main production agent is CodeActAgent in agenthub/codeact_agent/. Its core idea: give the LLM a single powerful action — running arbitrary Python code in a sandboxed IPython kernel. Instead of many specialized tools, it uses code as a general-purpose effector.

This is a notable architectural choice. Many agent frameworks give the LLM a fixed menu of tools. OpenHands gives it a code interpreter and lets the LLM compose arbitrary tool behavior. The tradeoff: more expressive, but requires the LLM to know how to write correct tool-calling code.

Runtime isolation

openhands/runtime/ handles sandboxed execution. Docker containers are the production path; there’s also a local runtime for development. The interface is clean: the runtime receives action events, executes them in isolation, and returns observation events.

The sandbox design is security-conscious. File access is restricted to a workspace directory, and network access can be controlled. Running untrusted code safely is a hard problem; this is their current approach.

Open questions

How does the memory/context management work across very long tasks?
How does it handle tool failures and recovery?
What’s the eval setup — how do they measure agent success rates?

I’ll update this as I continue reading.