Early Thoughts on OpenAI’s ‘Agents’

In This Post

What is OpenAI Agents?
Interesting tidbits

What is OpenAI Agents?

“Isn’t an agent just an LLM workflow?”. This has been true of a lot of companies building “agents” recently. I like Anthropic’s definition, which is a program that can use tools and make its own decisions.

OpenAI has finally released its take on agents, which equips LLMs with an elegant set of primitives:

Tools (function calling)
Handoffs
Guardrails
Tracing

How OpenAI Agents Works

An Agent is a model wrapper with user instructions as its system prompt.
Agents can recursively delegate or “handoff” to other agents by passing their conversational history.
1. This allows for modularity and separation of concerns. Unlike other frameworks, a handoff replaces the original agent with the new agent instead of starting a subprocess. OpenAI also provides an interface to filter/compact context as needed, but leaves implementation up to developers.
Agents can also call tools, which might be external functions (or other agents).
1. Tools receive more limited context, and have their outputs appended to their parent agent’s conversation history.
2. Examples: WebSearch, ComputerUse, FileSearch, etc.
LLM output determines when the application stops. Execution ends when the core agent has either produced an output that matches the designated output_type or when the agent produces a message that doesn’t involve any tool calls or hand-offs.
The shape of the workflow itself can be non-deterministic, with the agent deciding on what tools or handoffs to use.
Guardrails optionally assert on undesirable user inputs or check for unsafe model outputs.
Tracing allows for observability into how long LLM/function calls take or how much money/tokens you’re using.

I’m personally impressed by how clean and flexible the design is. There are other nice patterns like callbacks/hooks (e.g. which can pre-fetch data or log on agent lifecycle events) and async.gather for parallelizing many agent calls.

I Was Told a Diagram Could Be Helpful

Interesting tidbits

Type-checking with retries: Using Pydantic, if the LLM response does not adhere to your output_type, you already know that it has hallucinated and can reject its response. The agent will also continue to retry until output adheres to type. This eliminates the most common category of issue with using LLMs.

Enabling more model specialization: The extensibility of handoffs can allow a main triage agent to create sub-agents based on different models (such as a smaller model for a less complicated but need-for-speed task).

@function_tool decorator: Quality of life improvement. Auto-generates your JSON schema and tool descriptions from the function signature & docstring, as opposed to needing to manually construct it with Anthropic tool use.

For example, let’s create a search_weather function and use the @function_tool decorator:

@function_tool
async def search_weather(context: RunContextWrapper[Any], location: str, days: int = 3) -> str:
    """
    Fetches weather forecast for the specified location.
    
    Args:
        context: The execution context for this tool run.
        location: The city or address to get weather for.
        days: Number of days to include in the forecast (1-7).
    """
    # Implementation...

Under the hood, the decorator would turn this to a JSON schema like so (but you don’t need to do this anymore!):

{
  "type": "object",
  "properties": {
    "location": {
      "type": "string",
      "description": "The city or address to get weather for."
    },
    "days": {
      "type": "integer",
      "description": "Number of days to include in the forecast (1-7).",
      "default": 3
    }
  },
  "required": ["location"],
  "additionalProperties": false
}

Naturally, the actual function call would be:

search_weather(context_instance, location="Seattle", days=5)

3rd party libraries

OpenAI Agents’ guardrail and tracing features are relatively minimalist, but the API would seem to play well with existing 3rd party solutions. Below are two that I know of (not a sponsored plug).

Guardrails

OpenAI Agents allows you to create input guardrails which run in parallel to agent execution, and takes over control if user-provided input is unexpected (e.g. retry or fail)
Output guardrails can set ‘tripwire’ conditions such as sensitive_data_check which halt agent execution immediately

source: https://www.guardrailsai.com/docs

Looking at the code, it seems trivial to wrap existing validators from Guardrails AI Hub to use with Agents.

Tracing

One of the biggest challenges in building agents is getting visibility into each step of the process. OpenAI Agents has a tracing provider that logs essential events—LLM completions, tool calls, guardrails triggered, and handoffs. They also released a dashboard where you can visualize your traces.

OpenAI had a few suggestions for 3rd party tracers. I tried them all and found AgentOps to be far above the rest. It offers a robust way to track cost, latency, prompts, and completions, and you can even pinpoint which tools get activated and why. Having this level of observability made debugging traces much easier than the usual “copious use of print statements.” This made tracking guardrail activations and handoffs substantially easier.

AgentOps Session Replay

TL;DR

Computer Use Agents and Web Agents are still early days
Not Just an LLM Workflow™: LLM-based control flow and type-enforced outputs allow for dynamic structure
Agent handoffs mean we can and should easily allow agents to choose the best model for their task
There’s still a lot of room to define what an agent is, but this SDK provides a very extensible surface that can plug-and-play with existing solutions

Posted on March 11, 2025

state of kate