Reading OpenHands Source Code
A walkthrough of the OpenHands (formerly OpenDevin) codebase — how it structures agents, orchestrates tool use, and manages multi-step task execution.
Why OpenHands
OpenHands is one of the most mature open-source software engineering agent frameworks. Unlike most agent demos, it actually runs code, browses the web, and handles multi-step tasks. Reading its source is a good way to understand how production-grade agents are structured.
This is a live notes document — I’ll update it as I dig deeper.
Top-level structure
openhands/
core/ # Agent loop, schema, config
controller/ # State machine for agent execution
runtime/ # Sandboxed execution environments
llm/ # LLM provider abstraction
events/ # Event stream (the core communication bus)
agenthub/ # Concrete agent implementations
The key architectural insight: everything is an event. Agents don’t directly call tools. Instead, they emit action events (like CmdRunAction or BrowseURLAction). A runtime consumes those events, executes them, and emits observation events back. The agent loop reads observations and decides the next action.
The event stream
openhands/events/stream.py defines EventStream, the central bus. Every action and observation flows through here. This makes replay, logging, and debugging straightforward — the full history of any agent session is just an ordered list of events.
The event types are defined in openhands/events/action/ and openhands/events/observation/. Worth reading both directories to understand what agents can do and what information they can receive.
The agent loop
openhands/controller/agent_controller.py manages the core loop:
- Get current state (including event history)
- Call
agent.step(state)to get the next action - Execute the action via the runtime
- Receive an observation
- Add both to the event stream
- Repeat until done or max iterations reached
The simplicity here is deliberate. Most of the intelligence lives in the agent implementations (agenthub/), not in the controller.
CodeActAgent
The main production agent is CodeActAgent in agenthub/codeact_agent/. Its core idea: give the LLM a single powerful action — running arbitrary Python code in a sandboxed IPython kernel. Instead of many specialized tools, it uses code as a general-purpose effector.
This is a notable architectural choice. Many agent frameworks give the LLM a fixed menu of tools. OpenHands gives it a code interpreter and lets the LLM compose arbitrary tool behavior. The tradeoff: more expressive, but requires the LLM to know how to write correct tool-calling code.
Runtime isolation
openhands/runtime/ handles sandboxed execution. Docker containers are the production path; there’s also a local runtime for development. The interface is clean: the runtime receives action events, executes them in isolation, and returns observation events.
The sandbox design is security-conscious. File access is restricted to a workspace directory, and network access can be controlled. Running untrusted code safely is a hard problem; this is their current approach.
Open questions
- How does the memory/context management work across very long tasks?
- How does it handle tool failures and recovery?
- What’s the eval setup — how do they measure agent success rates?
I’ll update this as I continue reading.