The Agent Experiment · Part 1 of 5

I’ve been building with AI for a while now — long enough to have strong opinions about where most operators are getting stuck. They’re getting stuck in “prompt mode”. This means asking a question, getting an answer, then copying and pasting the good parts. That’s useful, but it’s not transformative. The conversation about what AI can truly do for a business is still happening and it’s mostly happening at a different layer.

The “prompt era” was AI as a tool. The “agent era” is AI as a teammate. The difference isn’t philosophical. It’s operational. Agents don’t need to wait for questions to act. They can proactively tackle goals, break them into subtasks, use tools, delegate to other agents, check their own work, and report back to you with output. You give them direction and they figure out the route. This is the transformative layer.

The “prompt era” was AI as a tool. The “agent era” is AI as a teammate. The difference isn’t philosophical. It’s operational.

But agents aren’t foolproof. In fact, poorly designed agentic AI (AAI) systems can lead to nightmarish results (there are many public, embarrassing examples from which to choose). So, you’d better be using industry-proven best practices when it comes to problem definition, system design, identifying and documenting functional requirements, and yes, even prompt building. AAI will act as a magnifier, exposing and amplifying every weakness in your operational procedures. Being methodical and deliberate will go a long way in getting AAI right.

And “getting it right” is what this series is all about. This isn’t a hype series. This is a practitioner series. In prep, I built three agent systems across three different agent frameworks. I then ran a genuine experiment, watched things break in real-time (and in instructive ways), fixed those breaks, and now I’m reporting exactly what happened so you can avoid the same pitfalls during your first AAI build.

This installment acts as the kick-off and sets the frame. We’ll discuss what agents are, why frameworks matter, and why SMBs, specifically, should be paying attention RIGHT NOW.

The Shift to Agentic AI

The numbers around AAI are hard to ignore at this point. Gartner projects by end of 2026, 40% of enterprise applications will include task-specific AI agents (up from less than 5% in 2025). The AI agents market hit roughly $7.8B in 2025 and is tracking toward $10.9B this year. PwC found that 88% of senior executives plan to increase AI budgets in the next twelve months specifically because of AAI.

None of that is the shift I’m talking about. Those are just the headlines and headlines can often be misleading.

The shift I’m talking about is structural. For the last two years, AI made people more capable: a better draft, a faster research pass, a smarter first cut. That’s all still valuable. But agents make processes more capable. A well-designed agent system doesn’t just help you work faster. It works while you’re working on something else. That’s a completely different category of leverage and productivity.

Unsurprisingly, the enterprise world has figured this out and is doing all it can to capitalize on this epiphany by locking you into their tech, their solutions. But what they haven’t figured out yet is the importance of governance, procurement, and risk management (which means they’re also moving slowly). Their AI pilots are sitting in legal review, their vendor contracts have long cycles, and their security teams have questions that can’t be answered easily.

Most SMBs have none of this overhead. The same tools enterprises are paying consultants to evaluate are also open source and free to download by anyone. The window where SMB operators can move faster than enterprise giants at AAI is open, but it won’t stay open for long.

What is an AI Agent?

When you get down to it, an agent is just an AI system with a goal, a set of tools, and the ability to take multi-step actions to accomplish that goal without need of human prompting. That’s it — it’s not magic, it’s not autonomous general intelligence. It’s simply a system that can plan, act, check its work, and loop until complete.

The practical distinction that matters for operators is the difference between Human in the Loop (HITL) and Human in Control (HIC). HITL is a passive posture where the human sits at a checkpoint approving AI output before it moves forward. The system still runs, but the human gets a veto. That’s better than nothing. But it’s not a governing philosophy.

HIC is an active posture. In such a framework, the human sets the goal, defines the constraints, designs the workflow, and owns the outcome. The agent is the “squad.” And the squad is capable, fast, and tireless. But the squad is also completely dependent on human direction to produce anything worth having. In this scenario, we don’t just approve the output — we shape the conditions that produce it.

HITL “by design” and HITL “by accident” look identical until something goes wrong.

This passive vs active distinction will matter a lot more in installments 2, 3, and 4 of this series. This is because one of the first things I learned when building these systems was HITL “by design” and HITL “by accident” look identical until something goes wrong. More on that later!

Now let’s discuss what an agent is NOT. It’s not a chatbot. It’s not autocomplete. It’s not a search engine with a personality. An agent can use tools. It can browse the web, read files, call APIs, write code, and pass instructions to other agents. It’s the use of tools that flips the typical AI interaction on its head. Tools means AI can stop telling you what to do and instead do it for you! That’s HUGE (trust me).

Three Frameworks, Three Jobs

There are dozens of agentic frameworks right now, and most of the conversation about them is too technical to be useful for SMB operators. My version of the conversation is aimed at helping you decide which tool fits which job. I’ve whittled it down to three main topologies (so you don’t have to).

In this series, we will focus on the following systems: AutoGen, LangGraph, and CrewAI. Each has its own native metaphor to best describe it.

AutoGen — The Boardroom

Agents that talk to each other. We’re talking real, productive conversations here. They disagree, pressure-test, defend positions, and converge on an answer through structured debate. Best when you want adversarial stress-testing or need multiple perspectives colliding on a single problem. I used it to run the “investment committee” for my experiment. Accordingly, I created four specialist agents to debate which business to start in today’s insane macro environment.

LangGraph — The Process Flowchart

Agents with memory state machine architecture. Conditional branching, quality gates, and loop-back logic. This system can check its own work, reject output that doesn’t meet criteria, and retry before moving forward. Best for workflows where sequence and quality control matter like build planning, document review, and multi-stage analysis with defined acceptance criteria. For my experiment, LangGraph handles product specification and technical architecture design based on output from the AutoGen team.

CrewAI — The Operations Org Chart

Agents that carry out role-based, sequential execution. You define specialists like a researcher, strategist, copywriter, and editor then assign tasks. The “crew” takes it from there by executing in succession and passing context down the chain. This is very close to how a real human team works when handling sequential delivery tasks. Best when you can draw an org chart for the work before you ever start. I use it to build a go-to-market package for a product based on the outputs from the previous two agent teams.

The most common mistake most operators make before they’ve touched any of this is treating framework selection as a technical decision. It is not. It’s a workflow topology decision. The question isn’t “which framework is best?” It’s “what does the work look like, and which tool is best designed to properly manage that work?”

  • Debate and convergence → AutoGen
  • Stateful process with quality gates → LangGraph
  • Role-based sequential execution → CrewAI

You can learn the technical details after you’ve got the topology right. Adding AI doesn’t remove common-sense, proper planning, and operational veracity from the equation.

The SMB Asymmetric Window

SMBs need to wake-up and pay attention. The enterprise AI advantage has always been resource-based — bigger compute budgets, dedicated ML teams, proprietary data at scale. But that advantage is narrowing fast.

AutoGen, LangGraph, and CrewAI are all open source. All three can run on API-based inference. This means you can outsource to one of the BIG 3 if you don’t want to maintain your own model. For example, a full AutoGen experiment session of the kind I ran costs between ten and forty cents in API fees — that’s it! As an SMB operator, you are only bound by your budget and the limits you bake into the framework. Inference spend can be capped at the call level, the daily level, etc. This means the infrastructure cost of AAI for SMBs willing to get their hands dirty is essentially ZERO. The barrier is trust, knowledge, and time — not money.

Salesforce’s SMB data found that 91% of small businesses using AI report revenue growth. The gap between enterprise and SMB AI adoption has compressed from 1.8x in early 2024 to roughly 1.2x by mid-2025. SMBs aren’t way behind anymore. They’re closing the gap. And on the agentic side, where enterprise is still wrestling with governance frameworks and procurement, SMBs with technical operators willing to experiment can even pull ahead!

But the asymmetric advantage isn’t permanent. Such advantages never are. Enterprise will catch up once the governance questions are answered, and the vendor landscape consolidates. The organizations building operational muscle with AAI tools now (learning what agents can and can’t do, where the failure modes live, how to design human direction into the system rather than bolting it on after) are those building something that takes real time to replicate. Speed of learning is the moat right now. Not access, cost, or technical skill.

I wanted to put my own opinions to the test. So, I built something (and I ended-up having a lot of fun while doing it).

What I Built

I wanted to see what a team of AI agents would do when given a real problem. So, I assembled an agentic team and gave it a task most SMBs struggle with daily: tell me the best business to start right now, given today’s insane macroeconomic and geopolitical realities. Then I gave the team real constraints, real macro data, and sent them on their way.

This meant building an AutoGen GroupChat with four specialist agents: a MacroAnalyst to frame the investment environment, an OpportunityScout to propose business concepts, a RiskDestructor to attack them, and a CapitalAllocator to rank survivors by capital efficiency and 12-month viability. The session ran against a live macro scenario file generated from real-world, March 2026 macro conditions. This meant real inflation numbers, real credit availability, and real geopolitical context. Then I issued the human-controllable constraints which included B2B only, US-only operations, max $250K launch capital, and twelve months to first revenue.

The session ran twenty rounds. During which, the agents proposed concepts, killed those concepts, defended positions, and then finally converged. The output received was a full investment thesis with ranked recommendations and detailed financial logic. The same kind of analysis a competent human team would spend two to three days producing.

The winner?

StandUply Pro: an AI-powered asynchronous standup automation tool for distributed engineering teams. The agents identified a specific market gap, stress-tested the economics, and built a twelve-month launch plan with budget allocations and go-to-market sequencing. Real output with enough specifics to be actionable for any SMB.

However not everything is perfect in AutoGen land. During the run, something went wrong. While the issue I encountered was highly instructive and ultimately led to a better implementation in the long run — it did catch me totally off-guard. I’ll go into detail on this in installment 2 of this series.

The TL;DR version: I made a design assumption about human participation in the session that turned out to be wrong, and the agents ran without me for twelve consecutive rounds before I even caught it. The fix revealed something important about the difference between a system that nominally includes a human and one that structurally requires human direction. That’s the HIC principle in practice and such findings deserve an installment of their own.

StandUply acts as the through-line for the rest of this series. Part 2 will cover the AutoGen session in full. I’ll discuss what the agents did, what broke, and what I learned. Part 3 will see StandUply’s investment thesis handed over to a LangGraph agent team for product specification and technical analysis tasks. Part 4 will then hand those outputs to a CrewAI crew to deliver a go-to-market strategy. And finally, Part 5 will synthesize what these three AAI frameworks, a single product journey, and several failures along the way can teach SMB operators about deploying agents in the real world.

I built all this using modest hardware (MacBook Pro), open-source frameworks, and API-based inference (Claude) at near-zero cost. I’m sharing exactly what happened, the working version and the broken version, because an honest account is more useful than a polished one.

The AutoGen agent team gave me a business idea. Part 2 will cover what happened when I let them loose.

Chad Schmookler is a Fractional COO/CPO with 20 years in operations and product leadership. Creator of the HIxAI operating philosophy. He writes on the convergence of AI, operations, and organizational strategy — and the gap between boardroom vision and operational reality. Follow Chad on LinkedIn: linkedin.com/in/cschmookler

Key Takeaways

  1. 1. Agents don't just make people faster — they make processes more capable. That's a structural shift, not a feature upgrade.
  2. 2. Framework selection is a workflow topology decision, not a technical one: match the tool to the shape of the work before you touch a line of code.
  3. 3. SMBs have a real window to outpace enterprise on AAI adoption — governance overhead is the enterprise's constraint right now, not theirs.
  4. 4. Human in Control means designing human direction into the system from the start, not bolting approval gates on after something breaks.
  5. 5. The infrastructure cost of running real agent experiments is near-zero. The barrier is knowledge and trust, not money.