AI Agents: The Transition from Chatbots to Actors

Felipe Hlibco

November 27, 2023

Last week one of our engineers at DreamFlare asked me a question that stuck: “When does a chatbot become an agent?” My first instinct — say something about autonomy — felt too clean. The real answer runs messier than that.

A chatbot reads. An agent reads and writes.

The distinction sounds simple, but software architecture reshapes entirely around it. And we’re right in the middle of that shift, watching it happen in real time, building pieces of it ourselves.

The chatbot era was read-only #

For most of chatbots’ existence, the technology served as glorified FAQ databases. A user typed a question; the system pattern-matched against a decision tree and spit back a canned response. Sometimes it worked. Mostly it frustrated people.

Even when NLP got better — BERT, GPT-2, the whole transformer wave — the fundamental interaction stayed the same. User asks, bot answers. The bot never performed any action. No flight bookings, no database updates, no triggered deployments. Just talk.

Read-only. And honestly, read-only AI served a lot of use cases fine. Customer support deflection. Simple Q&A. Surface-level information retrieval. But the ceiling arrived fast; people wanted AI that could actually act on their behalf.

Function calling broke the wall #

The real inflection point came in June 2023 when OpenAI released function calling for GPT-4. Before that, you could hack together tool use by stuffing instructions into the system prompt (“when the user asks to check the weather, output JSON with these fields…”). Brittle, unpredictable, failing in creative ways.

Function calling made everything structured. The model could say “I need to call get_weather(location='SF') and here are the arguments” — and application code could actually execute that call against a real API. Suddenly the language model had hands.

This wasn’t the first attempt at giving models tools. Meta’s Toolformer paper (February 2023) showed that language models could teach themselves to call APIs. Google’s ReAct framework from 2022 already formalized the loop: reason about what to do, act on it, observe the result, repeat. But OpenAI’s function calling turned that into something accessible to every developer with an API key.

The agent loop: reason, act, observe #

ReAct deserves more attention than it gets. Yao et al. published the paper in 2022, and the core idea runs elegant: instead of separating reasoning and action into different systems, let the model interleave them. Think, do, look at what happened, think again.

That loop — reason, act, observe — forms the skeleton of every agent framework emerging this year. AutoGPT, BabyAGI, LangChain’s AgentExecutor; all implement some version of it.

Here’s what makes this powerful: failure becomes part of the process. A chatbot either has the answer or it doesn’t. An agent tries something, fails, adjusts approach, and tries again. Not a minor upgrade. A fundamentally different relationship between AI and the systems it operates in.

At DreamFlare, our team ran experiments with agent loops for content generation workflows. The difference between “generate this” (one-shot chatbot behavior) and “keep working on this until it meets these criteria” (agent behavior) is night and day. The agent version produces better output because self-correction kicks in. API call costs climb too — a real tradeoff still being worked out.

2023: the Cambrian explosion of agent frameworks #

This year has been wild for agent development. AutoGPT dropped in March and went viral immediately — a fully autonomous GPT-4 agent that could browse the web, write files, and chain tasks together. Janky. Token-hungry. But proof of concept: give a language model tools, a goal, and a loop, and it’ll figure stuff out.

BabyAGI followed almost immediately. Simpler architecture; just a task queue with GPT-4 deciding what to do next and creating new tasks as needed. Then LangChain built AgentExecutor, giving developers a more structured way to wire up tools with reasoning loops.

Open-source energy around agents ran intense throughout the year, which surprised even people who’d been tracking the space closely. Every week brought a new framework, a new wrapper, a new approach to memory and planning. Most frameworks won’t survive (the churn feels exhausting), but the underlying pattern is real and sticky.

Nearly all of these frameworks converge on the same basic architecture: a model that calls tools, a loop handling the tool results, and some form of memory or context management. Differentiation lives in the details — how each handles long-running tasks, how context window management works, how error recovery gets implemented.

Read-write AI changes the security model #

Here’s the part that keeps me up at night as a CTO.

Read-only AI made security straightforward. The chatbot accessed only information it had authorization for, and the worst case was information leakage. Bad, but bounded.

Read-write AI operates as a different beast — one that demands rethinking access control from the ground up. An agent that modifies databases, triggers API calls, and executes workflows needs proper identity and access control. Not just at the “does this API key have permission” level; at the “should this particular reasoning chain be allowed to delete production data” level.

Systems where the AI acts autonomously within defined boundaries need three things:

What can the agent access? Not just which APIs, but which operations on those APIs. Read-only endpoints vs. write endpoints. Scoped permissions matching the task, not blanket access.

How do you audit agent actions? Every tool call needs logging. Every decision in the reasoning chain must trace back to something inspectable. When something goes wrong (and something will), the full reasoning chain leading to the action needs to be reconstructable.

Blast radius constraints. If an agent goes off the rails, how much damage accrues before something catches it? Rate limits, approval gates for high-impact actions, rollback capabilities.

Genuinely uncharted territory. Traditional RBAC and OAuth weren’t designed for entities that autonomously decide what actions to take. New patterns are necessary — and frankly, the industry hasn’t worked them out yet.

The uncomfortable middle #

Right now, practitioners sit in an uncomfortable middle period, where frameworks exist but production reliability doesn’t quite match demo performance. The agent frameworks exist. The tool-calling capabilities exist. The reasoning patterns (ReAct, chain-of-thought, tree-of-thought) exist. But reliability isn’t there yet.

Agents hallucinate tool calls. Loops become traps. Sometimes the agent “achieves” goals by taking shortcuts that technically satisfy the success criteria but miss the intent entirely. If you’ve spent any time with AutoGPT, the pattern is immediately recognizable.

The gap between demo and production is wide. An impressive agent demo takes an afternoon; making that agent reliable enough to run without supervision in production takes months of work and careful guardrails.

Still bullish, though. The trajectory is clear. Models improve at tool use. Frameworks mature (slowly). Use cases are obvious — every repetitive workflow involving reading information, making decisions, and taking actions qualifies as a candidate for agentic automation.

What this means for architects #

Architects building software today need to think about agent-readiness even before building agents. Agent-readiness means a few concrete things:

APIs need to be well-documented and structured. Agents consume APIs; poorly documented endpoints become invisible to them. Good schemas, clear parameter descriptions, predictable error responses.

Systems need proper authorization boundaries. The agent pushes on every permission it holds. Design access control assuming the caller might be an autonomous loop, not a human clicking buttons.

Monitoring needs to handle non-human traffic patterns. Agent loops generate bursts of API calls that look nothing like human usage. Rate limiting and anomaly detection should account for that.

Early days. The current generation of agents sits roughly where chatbots sat in 2016 — promising but clunky. The difference: underlying models already far exceed anything available back then, and iteration speed runs faster.

The transition from chatbots to agents isn’t just a feature upgrade. A paradigm shift in how AI interacts with our systems. The companies that figure out the architecture — the identity model, the permission model, the observability model — build a massive head start.

No one has all the answers. At DreamFlare, the team figures this out in real time, making mistakes, iterating. But the direction is right. Read-write AI arrives, ready or not.