“Isn’t an agent just an LLM workflow?”. This has been true of a lot of companies building “agents” recently. I like Anthropic’s definition, which is a program that can use tools and make its own decisions.
OpenAI has finally released its take on agents, which equips LLMs with an elegant set of primitives:
WebSearch
, ComputerUse
, FileSearch
, etc.output_type
or when the agent produces a message that doesn’t involve any tool calls or hand-offs.I’m personally impressed by how clean and flexible the design is. There are other nice patterns like callbacks/hooks (e.g. which can pre-fetch data or log on agent lifecycle events) and async.gather
for parallelizing many agent calls.
Type-checking with retries: Using Pydantic, if the LLM response does not adhere to your output_type, you already know that it has hallucinated and can reject its response. The agent will also continue to retry until output adheres to type. This eliminates the most common category of issue with using LLMs.
Enabling more model specialization: The extensibility of handoffs can allow a main triage agent to create sub-agents based on different models (such as a smaller model for a less complicated but need-for-speed task).
@function_tool decorator: Quality of life improvement. Auto-generates your JSON schema and tool descriptions from the function signature & docstring, as opposed to needing to manually construct it with Anthropic tool use.
For example, let’s create a search_weather function and use the @function_tool decorator:
@function_tool
async def search_weather(context: RunContextWrapper[Any], location: str, days: int = 3) -> str:
"""
Fetches weather forecast for the specified location.
Args:
context: The execution context for this tool run.
location: The city or address to get weather for.
days: Number of days to include in the forecast (1-7).
"""
# Implementation...
Under the hood, the decorator would turn this to a JSON schema like so (but you don’t need to do this anymore!):
{
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city or address to get weather for."
},
"days": {
"type": "integer",
"description": "Number of days to include in the forecast (1-7).",
"default": 3
}
},
"required": ["location"],
"additionalProperties": false
}
Naturally, the actual function call would be:
search_weather(context_instance, location="Seattle", days=5)
OpenAI Agents’ guardrail and tracing features are relatively minimalist, but the API would seem to play well with existing 3rd party solutions. Below are two that I know of (not a sponsored plug).
sensitive_data_check
which halt agent execution immediately source: https://www.guardrailsai.com/docs
Looking at the code, it seems trivial to wrap existing validators from Guardrails AI Hub to use with Agents.
One of the biggest challenges in building agents is getting visibility into each step of the process. OpenAI Agents has a tracing provider that logs essential events—LLM completions, tool calls, guardrails triggered, and handoffs. They also released a dashboard where you can visualize your traces.
OpenAI had a few suggestions for 3rd party tracers. I tried them all and found AgentOps to be far above the rest. It offers a robust way to track cost, latency, prompts, and completions, and you can even pinpoint which tools get activated and why. Having this level of observability made debugging traces much easier than the usual “copious use of print statements.” This made tracking guardrail activations and handoffs substantially easier.
AgentOps Session Replay