Origin Story

Goodnight Moon and the Long Road to Buffaly

How children’s books, language acquisition, dual-channel learning, ontology, and ProtoScript led to Buffaly.

Matt Furnari • May 14, 2026

I read Goodnight Moon to my son so many times that I probably still know every word by heart.

I also know The Very Hungry Caterpillar better than any adult should. And somewhere on my computer, I probably have one of the world’s least normal private collections of Thomas the Tank Engine material, not because I had a professional interest in children’s television, but because I spent a ridiculous amount of time trying to teach early versions of what became Buffaly to understand stories like that.

I would read to my son, watch what seemed to register, and then go write programs to mimic what I thought might be happening in his brain. He would connect words to pictures, characters, actions, routines, emotions, expectations, mistakes, and corrections. A word was never just a word. It was tied to an object, an event, a situation, a feeling, a repetition, and eventually to a more general idea.

That fascinated me.

Buffaly did not start as an LLM agent framework. It did not start as a reaction to ChatGPT, Claude, or the current agent ecosystem. The current Buffaly agent runtime is one practical piece of a much older line of work. The deeper project has been an ongoing attempt to understand language, meaning, representation, learning, tool use, and execution in a way that can actually be implemented.

I want to be careful about the tone here. Intelligence is not a solved field. I am not claiming that Buffaly solves it. I have had a lot of ideas over the years. Some worked. Some did not. Some were interesting but impractical. Some turned into real machinery. Buffaly is the accumulation of the ideas that survived enough contact with implementation to remain useful.

The public agent runtime is one of those useful pieces. It is not the whole system. It is the part that works well enough with modern LLMs to share now.

The Work Started Before the Children’s Books

The Goodnight Moon era is the most human way to explain this, but the work started earlier. I have spent decades circling around questions of representation and learning through whatever tools seemed promising at the time.

At different points, I explored evolutionary programming, rule-based systems, neural networks, reinforcement learning, support vector machines, and a lot of stranger ideas that were probably more interesting than useful. I remember going down one path inspired by a neuroscience book in the 1990s about emergent intelligence from loop harmonics. I cannot even remember the book now, but I remember trying to figure out whether feedback loops, rhythm, and recurrence could produce something useful.

That is how a lot of this went. Try an idea. Build something. See where it breaks. Keep the pieces that seem to explain something or make the system more useful. Throw away the rest, or come back to it years later from a different angle.

The reading experiments with my son gave the work a more concrete direction because they forced the problem into a form I could observe every day. Children do not learn language from text alone. They learn it from repeated contact with the world. They hear a word while seeing a picture, touching an object, watching an action, feeling a reaction, or testing a boundary. Language attaches to something outside itself.

That became one of the central ideas behind Buffaly.

Dual-Channel Learning

The first important idea is what I now think of as dual-channel learning.

A single channel is not enough. Language by itself is too unconstrained. If all the system has is text, every word can potentially relate to every other word, every phrase can imply several structures, and every pattern can create a large number of possible interpretations. The system can still learn, but the search space becomes enormous.

Modern machine learning makes this visible. Attention is powerful because tokens can attend to other tokens, but that same design exposes the underlying problem: when the system is working inside one channel, it has to consider a very large space of possible relationships within that channel. The computational cost is not an accident. It reflects the difficulty of learning structure from language alone.

A second channel changes the problem. When language is grounded against images, actions, data, code, or environment, the system has something outside the language stream to constrain interpretation. The second channel helps decide what is worth paying attention to. A word attached to an image has perceptual anchors. A phrase attached to action has consequences. A statement attached to data has structure. An instruction attached to code has an executable interpretation.

In my own work, language plus code and language plus data became especially important. Code and data force language to interact with types, values, operations, transformations, and outcomes. They reduce the ambiguity of interpretation because the system can test language against something structured.

This is one reason modern LLMs are so useful to Buffaly. They are unusually good at both language and code. But the deeper idea is older than LLMs: learning becomes more tractable when language is grounded against another structured channel.

Sememes and Lexemes

While watching my son learn, I also started developing a distinction between sememes and lexemes.

A lexeme is the language form: a word, phrase, expression, or surface unit. A sememe is closer to the underlying meaning: the thought, concept, or semantic unit that the language is pointing at.

That distinction helped explain something that was obvious in child language but easy to forget in software. A child can understand something before having the right word for it. A word can be used before its meaning is fully stable. A phrase can point to different meanings depending on context, action, and environment. Meaning and language are connected, but they are not identical.

That distinction shaped the way I thought about Buffaly’s representation layer. I did not want a system that stored only text. I wanted a system that could represent the thing the text was trying to point at, even if that thing was partial, provisional, or still being refined.

The Representation Problem

Once I started thinking about dual-channel learning and the separation between language and meaning, I ran into the representation problem.

Neural networks mostly avoid direct interpretability. Their representations are distributed, high-dimensional, and powerful, but they are not easy to inspect or edit in the way a software system often needs. You can train them, probe them, benchmark them, and observe their behavior, but you cannot easily open the system and see a concept as a stable object with relationships, inherited behavior, executable operations, and provenance.

I wanted a different kind of substrate. I wanted something that could represent language, semantics, code, data, actions, memories, and workflows in a form that could be inspected and modified. It had to be usable from language and from code. It had to support repeatable operations. It had to tolerate partial knowledge. It had to let the system refine what it knew over time.

That is where the ontology came from.

I use the word ontology mostly because there is no perfect term for what this structure does. It is ontology-like because it can represent taxonomies, categories, hierarchies, concepts, instances, and relationships. But it is not only a conceptual diagram or metadata layer. It is a dynamic graph-based data structure that can represent things traditional ontologies usually do not handle well: code, abstract syntax trees, runtime actions, data structures, memories, tools, semantic bindings, transformations, and learned procedures.

Over time, I also explored how to represent causality, implications, possibilities, and other structures in graph form. Some of that came from ProtoScript I wrote directly. Some of it came from Buffaly learning representations through use. Those areas are still active and incomplete, but they matter because they point toward a richer representation of what a system knows, what follows from what it knows, and what could happen next.

That is the kind of substrate I wanted: not a black box, not a static taxonomy, and not just text storage, but a structure the system could inspect, operate on, and improve.

The Raw System Before LLMs

The raw system worked for years before modern LLMs became useful.

It was not good at fuzzy language behavior in the way neural models are. It did not absorb a giant corpus and produce fluent completions. It did not have the same kind of distributed representation that makes LLMs so flexible.

But it was good at other things. It was good at structure, transformation, cross-domain generalization, and repeatable reasoning. It could represent relationships explicitly. It could transform objects across domains. It could perform certain kinds of generalization in a non-probabilistic way. The results were inspectable.

That was the point.

I was not trying to build a system that merely produced plausible language. I was trying to build a system where knowledge could be represented, transformed, executed, and inspected.

Modern LLMs filled in a part of the system that had always been difficult: flexible language and code interaction. They are good at ambiguity, summarization, loose semantic matching, code generation, and translating messy human intent into candidate structures. They can read and write code-like syntax. They can work as an interface between human language and a structured runtime.

The LLM did not replace the older work. It made the older work more useful.

Why ProtoScript Exists

For a long time, I used the ontology mainly as a graph-based data storage structure and manipulated it from other programming languages. That worked, but it became awkward as the system grew. I needed a language designed specifically for declaring, modifying, and operating over these graph structures.

That led to ProtoScript.

ProtoScript began as a way to make ontology manipulation more efficient and natural. It became a declarative language for defining and modifying graph structures. Its syntax is based on C#, which made it familiar for me and convenient for integration with existing code. It also turned out to work well with frontier models because they have been trained heavily on code and can reason about code-like syntax.

Conceptually, ProtoScript is prototype-based. The core data structure is called a prototype rather than a class or an object because there is not a strict boundary between a class and an instance. You can create a prototype from another prototype, then create another prototype from that one.

That design reflects how the system is meant to represent partial and evolving knowledge. Ordinary object-oriented programming usually expects a clean distinction between classes and instances. You might define Animal as a class, Mammal as a subclass, Primate as a subclass of Mammal, Human as a subclass of Primate, and then represent a particular person as an instance. That is useful in software, but language and thought often work with less certainty.

If someone says, “The monkey ate a banana,” the system may not know whether “monkey” should be treated as a class, an instance, a role in an event, or a partially specified entity. If someone says, “A monkey ate a banana,” the system may only know that something monkey-like participated in a banana-eating event. If someone says, “Henry ate a banana,” the system may not initially know what Henry is at all. It only knows that Henry is something capable of eating a banana.

A prototype-based structure lets the system represent that partial knowledge without forcing an artificial type decision too early. Henry can be created as a prototype and placed more precisely in the ontology later. The representation starts with what is known and becomes more specific as evidence accumulates.

That is one reason ProtoScript matters. It is not merely a syntax for a graph database. It is a language designed around the idea that meaning often starts incomplete, becomes structured over time, and may eventually become executable.

Memory as Runtime Material

Most agent memory is text memory. The system stores a conversation, note, summary, document, embedding, or retrieved fragment. That can be useful, but the model still has to reinterpret the memory every time it matters. The memory remains advisory.

Buffaly stores memory in a substrate that can represent language, concepts, actions, relationships, code, tool definitions, runtime behavior, and learned procedures. A memory can remain a note if that is all it needs to be, but repeated or important information can be promoted into something more structured.

The mechanism is not magic. A one-off exchange may remain text or structured context. A repeated pattern can be extracted into a reusable skill. If the skill stabilizes, parts of it can be represented in ProtoScript or native code. Future executions can use the structured version instead of forcing the model to reconstruct the same procedure from scratch.

This is what I mean when I say Buffaly remembers in code. The phrase is shorthand, but the underlying idea is specific: memory should be able to move from language into an inspectable representation substrate, and stable parts of that representation should be available for execution.

Tool Use and Existing Code

I have been writing agents for a long time, and one of the key problems I ran into early was tool exposure. I already had a large body of code built over twenty to twenty-five years, and I wanted agents to use that code directly. I did not want to wrap every useful method by hand or create a custom adapter for every tool.

Because ProtoScript is based on C# and interpreted through C#, C# interop was relatively straightforward. Buffaly can expose C# methods, C# types, ProtoScript methods, and ProtoScript actions directly to agents. The LLM can reason in natural language while the runtime interfaces through structured code and typed operations.

This is one reason Buffaly fits frontier models well. ProtoScript looks enough like code that the model can inspect it, modify it, and reason over it. At the same time, ProtoScript connects to the ontology, the memory substrate, and the runtime action system, so it is not just ordinary code.

That gives Buffaly a different tool-use model from the typical flat tool list. The agent does not need every capability rewritten as a hand-authored JSON wrapper. Existing C# and ProtoScript capabilities can become part of the runtime surface, and the system can expose the relevant pieces to the model when they are needed.

LLMs as Interface, Not Control Plane

One of the central design ideas in Buffaly is that LLMs are good at language and code, but they should not own the execution environment.

The model can reason in language, inspect code-like structures, select typed actions, write or modify ProtoScript, summarize results, and help generate new procedures. Buffaly uses those strengths aggressively. Execution, however, happens inside a runtime that owns the objects, tools, state, permissions, and traces.

A normal agent often treats the model as the place where the work happens. The model receives a prompt, sees tool descriptions, gets JSON results, holds temporary state in the transcript, and decides what to do next. Buffaly gives the model a structured environment to reason over. The runtime can store objects, constrain actions, expose tools, preserve memory, execute code, and record what happened.

The LLM did not create that architecture. It gives the architecture a powerful interface.

Higher-Level Agents, Graph Learning, and the Parts Still Coming

The current open agent runtime is only a fraction of the larger Buffaly environment. Some of the more interesting pieces are just beginning to be layered in.

The System 2 Watcher is one example. It represents agents that operate on longer timescales, inspect what lower-level agents are doing, and use a more introspective toolset. A fast, task-oriented agent can solve problems and call tools, but it can also stop too early, miss a verification step, choose the wrong abstraction, or fail to use the system’s memory properly. The Watcher is meant to observe the trajectory of work and intervene when the lower-level process is drifting.

The Critic is another example. It is a learning agent involved in automatic ontology learning. Since Buffaly historically stores what it learns in an ontological substrate, automatic ontology learning has always been a major research direction for me. The Critic can examine completed work, identify reusable structures, suggest semantic entities, and help move knowledge from one-off execution into the ontology.

Beyond that, there are representational areas that are only lightly visible in the current agent implementation. I have written ProtoScript, and Buffaly has learned structures, for causality, implications, possibilities, transformations, and other graph-representable ideas. I am excited to keep exploring those areas with the current approach because they point beyond ordinary tool use. They suggest ways for a system to represent not only what happened, but what could happen, what follows from what, and which transformations are available in a given state.

The ontology database can also support learning over graph structures. You can assign values to prototypes, represent possible actions or transformations as graphs, and learn over those graphs. One earlier experiment used graphs to describe how to convert from one type of C# object to another and which methods should be used for the conversion. That was useful because the learned process remained inspectable.

I have also experimented with activation spreading to give the localist graph representation some of the advantages of distributed representation. I do not know how far that path goes, but it remains interesting. The goal is not to copy neural networks. The goal is to see which useful properties of distributed representation can be layered onto an inspectable graph substrate.

Ontology and Vector Search

Buffaly uses both ontology and vector search because they solve different parts of the problem.

The ontology is useful when the system needs explicit structure: hierarchy, type information, known relationships, inspectable meaning, action boundaries, inherited behavior, and provenance. Vector search is useful for fuzzy discovery. It helps retrieve relevant actions, entities, or memories across language variations.

For vector embeddings, I use the Semantic Database product, which makes it easy to embed actions or entities and retrieve them. In Buffaly, vector search is not a replacement for the ontology. It is a discovery mechanism that works alongside the ontology. Vectors help find candidates; the ontology provides structure, execution boundaries, and inspectability.

The important part is assigning the right responsibility to each layer. Fuzzy matching is useful for discovery, but it should not be the sole authority for execution. Structural relationships are useful for control, but they should not be the only way to find relevant concepts across messy language.

Why the Current Release Comes First

The piece of Buffaly being opened now is not the entire system. It is a practical piece that works well with LLMs and exposes useful parts of the architecture: ProtoScript, ontology-backed memory, typed actions, semantic entities, tool discovery, C# interop, provider modules, web modules, and inspectable execution.

The larger system contains more research. Some of it depends less on LLMs. Some of it is more theoretical. Some of it has been used internally for years but needs documentation and cleanup before it can be made approachable. Some pieces are simply not ready to be public yet.

The reason to release the current agent runtime first is that it works now. It gives developers something concrete to run, inspect, and build on. It demonstrates what becomes possible when LLMs are placed inside a structured runtime rather than being asked to serve as the entire system. It also creates a path for layering in the deeper learning architecture over time.

I do not want Buffaly to be another abstract claim about neurosymbolic AI. If the idea is real, people should be able to touch it. They should be able to inspect the source, run the installer, build tools, and see where the architecture is strong or weak.

What Buffaly Really Is

Buffaly is not just an LLM agent framework. It is one practical release from a larger research program around language, learning, representation, tool use, and executable memory.

The current system uses LLMs because LLMs are useful. They are strong at language, code, summarization, ambiguity, and flexible reasoning. Buffaly benefits enormously from that. But the architecture is built around a different assumption: the runtime should own memory, structure, tools, actions, permissions, and execution, while the model helps reason over that runtime.

Most agents treat memory as text that returns to the prompt. Buffaly uses a representation substrate where memory can become structured, inspected, connected to tools, and eventually executed. That is why Buffaly started somewhere else, and it is why the current public release is only the first visible part of a much larger system.