Architecture Essay

Buffaly: A Real Alternative to Scaling Forever

A runtime-first path for using frontier models without making them memory, execution, truth, policy, and control all at once.

Matt Furnari • May 12, 2026

There are two lazy ways to think about LLMs.

The first is to treat them like magic beans. Just keep scaling. Throw bigger models, massive context windows, more retrieval, more agents, more tools, more synthetic data, more reinforcement learning, and more GPU clusters at every problem until the model somehow learns memory, tool routing, auditability, organizational context, deterministic execution, regulatory caution, and common sense. If something breaks, add another prompt. If the prompt fails, add another model call. If the model call fails, add another agent. If the agent fails, wait for the next frontier model.

The second lazy approach is to complain endlessly about LLMs while offering nothing useful in their place. Critics correctly point out hallucinations, prompt brittleness, shallow reasoning, weak memory, context-window dependence, tool-calling failures, and the obvious mismatch between probabilistic text generation and serious operational work. Then the proposed alternative is usually vapor. “We need symbolic reasoning.” “We need neurosymbolic systems.” “We need ontologies.” “We need explicit world models.” Fine. Where is the code? Where is the thing people can run? Where is the system that does not collapse back into a giant prompt wrapped around a few API calls?

But there is a third path, and it is called Buffaly.

Buffaly is a neurosymbolic architecture that uses the strengths of frontier models, but does not depend on them to be the entire system.

Buffaly is a neurosymbolic architecture that I have been working on for a long, long time, and it is amazing. It uses the strengths of frontier models, but it does not depend on them to be the entire system. It has continuous online learning. It can extend itself, optimize its code, increase its tool surface while reducing cost, hallucinate less, and remember more.

I have moved my entire workflow to Buffaly from Codex and ChatGPT. Throughput is up 5x to 10x, depending on the task, and cost is a fraction of what it was.

This Matters Beyond Cost Savings

I do not want to be melodramatic, but I also do not want to understate the problem. The direction we choose for AI architecture matters.

If we continue down the path where every problem is solved by making neural networks larger and giving them more authority, there are two ugly endpoints.

Worst case

Autonomous black-box systems act at enormous scale while remaining hard to govern, constrain, inspect, or reconstruct.

Best bad case

Ordinary work becomes endless rented cognition: cloud inference loops rediscovering structures we already understand.

The worst-case endpoint is the science-fiction version: increasingly autonomous black-box systems, trained and reinforced until they can act in the world at enormous scale, while still lacking the explicit structure humans need to govern, constrain, inspect, or reconstruct their behavior. You do not need to believe in a literal Terminator future to see the risk. Giving more and more authority to systems whose internal behavior we cannot understand is not a safety strategy. It is a bet.

The best-case endpoint is less cinematic but still bad: we turn ordinary work into endless token consumption. Every routine workflow becomes a cloud inference loop. Every company rents cognition by the token forever. We burn enormous compute asking neural networks to rediscover structures we already understand, while polluting the planet with unnecessary computation and calling it intelligence.

That future might be profitable for the companies selling inference. It is not obviously good for everyone else.

Telling people to stop using LLMs is not a serious answer. They are too useful. They are too powerful. They already solve real problems. If you want people to stop misusing a powerful technology, you have to give them something better.

Neurosymbolic Crackpots

This is not a moral lecture about why LLMs are bad. It is an attempt to show that there is another architecture available: one where models are used aggressively, but they are not asked to be memory, execution, truth, policy, and control all at once.

The neurosymbolic crowd has not helped itself. Too much of the public argument has sounded like Gary Marcus yelling from the balcony and constantly shifting goalposts. They are sometimes right about the limitations of pure neural approaches, but being right in the abstract is not the same as giving developers a better path. If the alternative to LLM maximalism is just another lecture about why LLMs are flawed, nobody should be surprised when builders ignore it.

Buffaly is my attempt to put a real alternative on the table. Not as a paper, not as a diagram, not as another complaint about hallucinations, and not as another vague promise that symbolic AI will return if everyone waits long enough. Buffaly is a working runtime-first architecture for high-trust agents. It treats the LLM as a powerful component, not the whole operating system.

The Other Side of Universal Approximation

The last few years surprised a lot of people because neural networks turned out to be far more general than expected. The lesson many people took from that is simple: keep scaling and the model will eventually learn everything.

That lesson is incomplete.

Universal approximation tells us that a sufficiently large neural network can, in principle, approximate an enormous class of functions. In practice, scaling has gone much further than most people expected. Models now display behaviors that look like reasoning, planning, translation, search, programming, tool use, and pieces of software.

But there is a corollary that the industry keeps ignoring:

If a neural network can eventually learn to approximate a behavior, a designed architecture can often implement that behavior directly, sooner, cheaper, and with far more control.

We do not have to wait for a model to spend billions of dollars rediscovering software structures we already know how to write down.

Instead of waiting for a model to learn durable memory, we can build durable memory into the runtime. Instead of waiting for perfect tool routing to emerge from reinforcement learning, we can build typed tool discovery. Instead of hoping the model stops hallucinating domain identifiers, we can constrain it to valid semantic entities. Instead of asking the model to reconstruct organizational context from a long prompt, we can store and operationalize that context directly. Instead of paying for repeated reasoning forever, we can promote stable workflows into skills and eventually into code.

The fastest path is not always more scale. When the structure is known, encode it. When the operation is stable, turn it into code. When the output must be exact, constrain it. When the decision must be auditable, trace it. When the model is likely to hallucinate, change the task so hallucination is no longer the authority layer.

The Deep Tech: Twenty Years in the Making

To understand why Buffaly works, you have to understand that it is not just another agent framework, nor is it a single clever algorithm. It is a comprehensive ecosystem of working systems, forged over two decades, that introduces entirely different primitives for software engineering.

The foundation is ontology as an active data structure.

In most systems, an ontology is a static diagram, a taxonomy, or a metadata layer used for search and classification. In Buffaly, ontology is a computational substrate. It is a live, graph-based structure where concepts, state, logic, policy, identity, and relationships actively participate in execution. That changes the fundamental role of “knowledge.” Knowledge is not merely retrieved and pasted into a prompt. It mathematically constrains what actions are valid. It defines which tools are relevant based on state. It preserves provenance and supports runtime inheritance. In Buffaly, the ontology is not documentation about the system. It is the operating environment.

The execution layer is ProtoScript.

ProtoScript is an entirely new language designed specifically for semantic, graph-based, AI-assisted computation. It gives Buffaly a medium where language becomes operational. When the system learns something useful, it does not merely “remember” by saving a text note. It writes ProtoScript. A vague concept becomes a semantic entity, which becomes a typed object, which becomes a reusable skill, and finally deterministic code. It is the syntax that allows both humans and LLMs to collaboratively read, write, and harden logic.

But ontology and ProtoScript are just the bedrock. Buffaly is a complete, interconnected host of working systems.

It is a full runtime architecture. It is not just a prompt with tools bolted on; it is an engine that owns state, memory, dynamic tool routing, execution traces, native C# interop bridges, web modules, and provider abstraction layers. It includes a memory promotion engine that watches workflows and transitions them from probabilistic guessing to compiled code. It has deterministic policy guardrails that reject invalid actions before an LLM even sees them. It is an entire computational paradigm designed to run high-trust operations natively.

For years, this massive engine operated without modern LLMs. It had the semantic structure, the runtime logic, and the graph-based execution. What changed recently is that frontier models finally became good enough to serve as the natural-language reasoning layer. They were the missing spark for the engine I had been building for twenty years.

The result is probabilistic language reasoning seamlessly integrated with a host of deterministic, graph-based execution systems. People keep gesturing at this idea in academic papers and calling it “neurosymbolic AI.” Buffaly is what it looks like when that idea is actually built, tested, and running in production.

The Results

Buffaly works. It is the most amazing piece of software I have ever dealt with.

5x to 10x

throughput increase in real internal workflows

~80%

lower token cost in one optimized workflow

<$12

model cost in one 15,000-patient experiment

Over the past few months, we moved much of our company’s work into Buffaly. The results were not subtle. Development throughput increased by roughly 5x to 10x in real internal workflows. Token costs dropped dramatically. One FairPath workflow became 80% cheaper as Buffaly optimized itself over three consecutive runs, while becoming faster, more repeatable, and more reliable.

That is the pattern Buffaly is designed to create. The first time a workflow runs, it may require model reasoning. The second time, the system can capture the pattern as a reusable skill. Eventually, stable pieces can become deterministic code. Once that happens, the system no longer pays a model to rediscover the same procedure again and again.

In one 15,000-patient-scale administrative processing experiment, the total model cost was under $12. That number is workload-specific, not a universal benchmark, but it shows the architectural point. When the runtime owns the process and the model is used only where model reasoning is actually needed, the economics change completely.

Buffaly has also become useful for high-trust operational work. It runs in our environment. It can interact with our servers. It can help with AWS and infrastructure workflows because it is grounded in our tools, our systems, our history, and our prior decisions. A Buffaly instance on my development machine can talk to a Buffaly instance on a production machine and help troubleshoot issues. That is a very different thing from pasting logs into a chatbot and hoping it understands the context.

My own ChatGPT usage collapsed. Not because the underlying models got worse, but because raw chat stopped being the right interface for serious work. ChatGPT is useful for isolated reasoning. Buffaly is useful for operating the company.

That is the difference.

What Buffaly Does Differently

Buffaly gives knowledge a path out of text. A phrase can become a semantic entity. A semantic entity can become a typed object in the ontology. Typed objects can gain relationships. Relationships can support rules. Rules can support actions. Actions can become skills. Skills can become ProtoScript or deterministic code.

That is what I mean when I say Buffaly remembers in code. It does not mean everything begins as code. It means the system has a path from language to structure to execution. Knowledge does not have to remain trapped as advice inside a prompt.

Buffaly also separates reasoning from execution. The model handles language, ambiguity, synthesis, proposal generation, code generation, and explanation. The runtime owns execution, typing, object identity, traceability, and control. The model can reason, but the runtime executes.

That boundary is crucial. A prompt says, “Do not hallucinate.” A runtime says, “Here are the valid entities. Here are the allowed actions. Here is the object. Here is the code that will execute. Here is the trace.” Those are not the same category of control.

Buffaly narrows the action surface instead of dumping every possible tool into context. We have run environments with roughly 1,700 available tools or actions without stuffing all of them into the model’s context window at startup. The model does not need to see everything. It needs to see the right thing at the right time. The runtime should handle discovery and narrowing.

Buffaly uses typed actions instead of JSON theater. A lot of current tool use is a loose text-to-JSON ritual: the model reads prose, invents arguments, passes serialized data, receives serialized data, and tries to keep the meaning straight. Buffaly is built around typed actions, runtime objects, and inspectable contracts. A capability can be registered, invoked, traced, reused, and eventually hardened.

Buffaly turns workflows into skills, and skills into code. This is the compounding loop that produces the token reductions. The first time a workflow runs, the system may need model reasoning. If the workflow repeats, Buffaly can capture it as a skill. If the skill stabilizes, parts of it can become deterministic code. The system gets cheaper and more reliable because the work is no longer being rediscovered from scratch.

Buffaly also preserves organizational memory. Companies lose enormous amounts of intelligence every day in Slack threads, one-off AI chats, private debugging sessions, undocumented decisions, and tribal knowledge. Buffaly makes conversations, decisions, tools, outputs, follow-ups, and documentation part of the operating memory. Documentation becomes less of a separate chore and more of a byproduct of doing the work.

Finally, Buffaly is multi-model and extensible by design. I do not believe one model should be the center of everything. Different models are good at different things. Buffaly treats models as providers inside a larger runtime. The provider can change. The runtime persists. Tools, provider modules, and web modules are first-class extension surfaces because the goal is not to trap everything inside my private implementation. The goal is to let the system grow.

How Buffaly Reduces Hallucination

Buffaly reduces hallucination by changing the job, not by begging the model to behave better.

A larger model might reduce hallucinations. A better architecture can remove entire categories of hallucination from the task.

If a domain identifier must be exact, constrain the model to valid identifiers. If a tool must be called safely, expose a typed action. If a workflow repeats, turn it into a skill. If a process stabilizes, turn it into code. If a decision must be auditable, trace it. If sensitive data should not enter the prompt, keep it in the runtime and expose only a controlled handle.

That is the difference between approximation and implementation.

The raw LLM path tries to train more reliability into the model. Buffaly externalizes reliability into the architecture around the model. Both paths can improve, but one of them is available now.

Why Public-Source Now

I am opening Buffaly because the field needs a real alternative today.

Not because Buffaly is finished. Not because every dependency is perfectly packaged. Not because the developer experience is already as polished as I want it to be. I am opening it because waiting for perfection would be another form of hiding.

Buffaly is entering a public-source developer preview. The core repositories are GPLv3 by default, with commercial licensing available for organizations that need different terms for proprietary use, redistribution, private embedding, hosted product use, supported deployment, or private domain-pack packaging.

The release is intentionally practical. The fastest way to try Buffaly is the installer. The source is available for inspection, debugging, plugin development, tool development, and partner integration. Some supporting libraries, production adapters, private domain packs, customer connectors, healthcare-specific workflows, and deployment assets are not public.

That is not open-source purity. It is an honest release from one person who would rather share the useful architecture now than spend months polishing the optics while the real ideas remain private.

Buffaly itself is my work. Intelligence Factory supports deployments and helps bring systems into the field, but Buffaly is my architecture, my implementation, and my long-running research project. The public-source preview is a way to let other serious builders inspect it, challenge it, extend it, and apply the ideas in domains I have not touched.

The Third Path

The AI debate should not be a binary choice between blindly scaling LLMs forever and complaining about LLMs from the sidelines.

There is a third path.

Build systems around the models. Give them typed tools. Give them semantic structure. Give them execution boundaries. Give them persistent memory. Give them provenance. Let them reason where reasoning is useful, but stop making them own everything.

That is the path Buffaly is exploring.

I am not asking people to believe in an abstract theory. I am putting the architecture in front of them. Inspect it. Run it. Break it. Build a tool. Tell me where it sucks. Challenge the assumptions. Fork the ideas. Apply them somewhere new.

If we want a future where AI is powerful, useful, and less insane, we need better systems around the models. Buffaly is twenty years of research and engineering aimed at proving that such a system can exist.

The AI world has had plenty of llamas. I think it is time to see what a Buffaly can do.