Overview of Behavior Graphs

Ben Houston • 15 Minutes Read • February 19, 2026

There's a moment that happens on any sufficiently complex 3D project where your linear visual scripting system stops being useful. A designer stares at a tangle of rules, realizes none of the combinations they need actually exist, and calls in an engineer. The engineer looks at the rule system, decides it can't do what the client needs, and writes custom code instead. The visual tooling is bypassed entirely.

I've run into this a lot. It's what sent me down the path of building behavior graphs.

For a long time I thought Trigger-Action Lists were good enough. They weren't.

The Ceiling of Trigger-Action Lists

The platform I was working on at the time, like many 3D platforms of that era, used a rule system based on the Trigger-Action List model. You define one or more conditions, attach one or more actions, and you're done. It's the same model used by Adobe Aero and Apple Reality Composer. It's approachable for simple cases and quick to teach.

The problem emerges as soon as a project gets real. Trigger-Action Lists have no way for nodes to communicate with each other, so you can't query the scene to compute a value and then use that computed value somewhere else. You can't do conditional execution — there's no "if" that branches based on a queried value. You can't implement a flip-flop (open door / close door). Loops are out of the question. The underlying model is a flat list, and flat lists can't express flow.

The workaround the industry settled on was compound nodes: single nodes that pre-bake a specific combination of query plus action into one monolithic unit. Need to check whether a mesh is visible and then hide it? There's a node for that. Need to check the value of a custom attribute and then trigger an animation? There's a (different) node for that. Every new combination requires a new node definition, and every node definition requires someone to anticipate all the parameters it will ever need. It does not compose. It does not scale.

When I talked to the implementation team, their honest answer was that on complex projects they skipped the rule system entirely and dropped down to custom code. They had to — the rules couldn't express what the client needed. But not everyone can write code, and custom code is expensive to maintain and impossible for a designer to read.

What the Game Industry Figured Out

The games industry faced exactly this problem a decade earlier and converged on a solution: behavior graphs — the generic name for the Blueprints-like approach pioneered by Unreal Engine and adopted by Unity Visual Scripting and NVIDIA OmniGraph.

The insight is straightforward but powerful. Instead of a flat list of compound rules, you have a directed graph of small, atomic, composable nodes. Two kinds of connections wire them together: flow links, which control the sequence of execution (the white arrows in Blueprints), and data links, which pass values between nodes as immediate parameters. This combination — flow + data — gives you everything: conditional branches, loops, delays, arithmetic, custom events, variables.

Most games built in Unreal Engine have their game logic maintained primarily in Blueprints, not C++. That's not a compromise — it's a feature. Artists, designers, and programmers all read and modify the same artifact. The visual spatial layout externalizes execution flow in a way that is cognitively accessible to non-programmers. The entry cost to understanding what a graph does is genuinely lower than reading equivalent code, and that accessibility is irreplaceable at the boundary between technical and non-technical collaborators.

Behavior graphs are also a proper superset of Trigger-Action Lists. Any Trigger-Action List can be mechanically translated into a behavior graph. The converse is not true. That means adopting behavior graphs doesn't invalidate simpler systems — it subsumes them.

What Behavior Graphs Are

A behavior graph is a directed graph of nodes connected by typed sockets and links. The key primitives:

Events — entry points into the graph, triggered by lifecycle events, user interaction, timers, or custom signals.
Actions — nodes that cause changes: trigger an animation, set a scene graph property, play a sound.
Queries — nodes that read live state: scene graph values, user input, environment data.
Logic — pure computation: arithmetic, boolean logic, comparisons, string manipulation.
Flow — execution control: if/else, sequences, loops, delays.
Variables — readable and writable slots for abstract state.

Execution is driven by flow sockets. When an event fires, it places the next node on a work queue. Each node resolves its data inputs by immediately evaluating any connected subgraph, runs its internal logic, and then follows its outgoing flow links to enqueue the next node. The host controls how many nodes are processed per time slice, which is the basis for a strong security model: even a behavior graph that contains an infinite loop cannot DOS the client, because execution is interruptible at-will by the host. The set of supported node types defines a constrained sandbox — you can withhold any capability by simply not registering its node type.

Building behave-graph

When I started advocating for behavior graphs at Khronos in 2022, I encountered genuine skepticism. The concern wasn't conceptual — most people understood the argument — it was practical. Would a behavior graph execution engine actually be small enough to ship in a browser context? Would it be fast enough? Was the execution model as simple to implement as I claimed?

Arguments don't answer those questions. Code does.

I built behave-graph in the summer and fall of 2022: a headless, extensible behavior graph execution engine in TypeScript with no external dependencies. The production build weighs in at around 10KB. In performance testing it achieves over 7 million node executions per second on desktop and over 2 million on low-end phones. The code is permissively licensed to encourage adoption.

A proof of concept is only convincing if it covers the hard cases. behave-graph supports control flow (if/else, sequences, flip-flops, for-loops), variables, custom events, async nodes, and 3D math types. Community contributors subsequently built a React-based visual editor on top of it. The library is behaviorally equivalent to Unity Visual Scripting and Unreal Engine Blueprints — not as a simplification but as a deliberate structural alignment.

Taking It to the Standards Process

Having a working implementation changes the conversation. I presented behavior graphs at the SIGGRAPH 2022 Birds of a Feather session on glTF interactivity, and again at the Metaverse Standards Forum in June 2023. The pitch was the same both times: the web 3D ecosystem has no open standard for interactivity. Every platform — Blueprints, Unity Visual Scripting, OmniGraph, Scratch, Adobe Aero, Apple Reality Composer — is a silo. Behavior graphs are not a new invention; they are the distillation of what the industry has independently converged on, and they deserve the same standardization treatment that glTF brought to geometry and materials.

The Khronos glTF Interactivity Working Group — chaired by Dwight Rogers from Adobe, with participants from Google, Microsoft, Meta, Amazon, and others — adopted the behavior graph model and has been working toward a KHR_interactivity glTF extension. The approach is deliberately layered: a core specification covering basic control flow, variables, custom events, logic, and scene graph manipulation, followed by use-case-specific extensions for animation, audio, physics, and AR.

The spec stores the behavior graph as JSON inside the glTF file. Node inputs, outputs, and flow connections are all represented with typed sockets. The structure is designed for load-time validation — corrupt or unsupported graphs are catchable before any execution begins.

Early adopters validated the direction. Mozilla Hubs integrated behave-graph into their platform and built a Blender add-on for authoring interactive glTFs. A project called My Mül stored behavior graph state on the blockchain to enable characters to exist and behave consistently across different game engines — which is the promise of interoperable behavior in its most literal form.

The long-term vision is compelling: a glTF file with a KHR_interactivity behavior graph is engine-agnostic. The same JSON representation should execute in Babylon.js, Three.js, Mozilla Hubs, or a native game engine. The node vocabulary becomes a portability contract in the same way glTF's material model is a portability contract — and the standardization work at Khronos is exactly the process by which that contract gets ratified.

Questions & Answers

The following is a set of questions I've been asked — or asked myself — about behavior graphs, visual programming, and where this all leads.

Q1: Is code the superior medium ultimately, or is there an area where visual programming is uniquely better?

Code is clearly superior for complex, large-scale systems — that's precisely why implementation teams working on complex projects would bypass visual rule systems entirely and drop down to custom code. The discipline of text, version control, refactoring tools, and composability at scale is hard to beat.

That said, visual programming has a genuine and irreplaceable advantage at the boundary between technical and non-technical collaborators. In Unreal Engine, Blueprints aren't just tolerated — they're beloved, and by a broad audience: artists, designers, and programmers all reading and modifying the same artifact. That's a genuinely unique property. The visual spatial layout externalizes the execution flow in a way that is cognitively more accessible. The "spaghetti" problem notwithstanding, the entry cost to understanding what a graph does is meaningfully lower than reading code for non-programmers. So visual programming isn't universally better, but it's uniquely better as a shared language across a team with mixed technical depth.

Q2: Is the "powerful for simple, nightmare for complex" criticism an inherent limitation? Do visual systems inevitably migrate toward code at sufficient complexity?

This is largely a tooling problem, not an inherent execution model problem. The criticism is most valid against Trigger-Action Lists specifically — and that's precisely the argument I made for moving away from them. When you lack the ability to communicate between nodes, compose logic, or use control flow, complexity forces you into a proliferation of compound nodes that each hard-code specific combinations of query + action. That is a maintenance nightmare, and it is inherent to that model.

Behavior graphs address this directly: atomic, composable nodes combined via both flow links and data links can express the same complexity as code without requiring compound nodes. The "spaghetti" problem in large Blueprints is real, but it's a UX and tooling failure — lack of good subgraph encapsulation, layout automation, and search. Unity Visual Scripting and Unreal Engine are actively improving these tools. The execution model itself scales fine; we demonstrated 10M+ node executions per second with behave-graph. The ceiling isn't the model — it's the editor.

Q3: How does visual programming survive in an age where LLMs are adept at generating code?

This is the most interesting question, and I think it actually strengthens the case for behavior graphs rather than weakening it. LLMs generate text — they are fundamentally a code generation tool. But the output artifact (code) is still opaque to non-technical collaborators. A game designer can't look at generated TypeScript and understand or safely modify the behavior of a 3D asset.

Behavior graphs, particularly in the glTF interactivity context, are a structured, validated, serializable representation of logic. An LLM could absolutely generate a behavior graph from a natural language prompt — "when the user clicks the door, play the open animation, then after 2 seconds play the close animation" — and the output would be something a designer could inspect, modify, and reason about visually. So rather than competing with LLMs, behavior graphs may become the ideal output format for LLM-assisted behavior authoring: safe (sandboxed execution model), inspectable, and interoperable across engines.

Q4: Why has visual programming never really taken off? Tooling gap or inherent limits?

Both, but I'd weight tooling more heavily. The fragmentation problem is severe: every platform has its own proprietary visual scripting system — Blueprints, Unity Visual Scripting, OmniGraph, Scratch — none of which are interoperable. There's never been an open standard. That's exactly why I brought behavior graphs to the Metaverse Standards Forum and why we're working on the KHR_interactivity glTF extension. Without interoperability, visual programming artifacts are locked to a single engine, which limits adoption and investment in tooling.

The other factor is that Trigger-Action Lists have been mistakenly treated as synonymous with visual programming. Systems like Adobe Aero and Apple Reality Composer are all Trigger-Action Lists, not behavior graphs. They hit a complexity ceiling quickly, users get frustrated, and that frustration gets attributed to visual programming broadly. Behavior graphs are a proper superset — they don't have those limitations — but most people's lived experience with "visual programming" is with the weaker model.

Q5: Can behavior graphs, as a superset, handle any possible use case of visual programming as the underlying execution type?

Yes, with an important qualification around domain-specific expressibility. The execution model itself — a directed graph with both data flow (immediate parameter links) and control flow (flow sockets) — is sufficient to express anything a Turing-complete system can express. I noted in my presentations that virtually all behavior systems, even Trigger-Action Lists, are technically Turing-complete because you can simulate variables via hidden scene state and simulate loops via self-triggering events. Behavior graphs just make this explicit and ergonomic.

The qualification is this: the execution model is universal, but the node library is domain-specific. A behavior graph without scene graph nodes can't manipulate a 3D scene. A behavior graph without audio nodes can't play sounds. This is actually a feature, not a bug — it's the constrained sandbox security model. So the answer is: behavior graphs as an execution model are a universal superset; behavior graphs as a deployed system are only as capable as the registered node types. Designing good node vocabularies per domain is where most of the real work lives.

Q6: What new features would you add to behave-graph if you had time?

Several things feel unfinished or missing:

Subgraphs / encapsulation is the most important. Large graphs become unnavigable without the ability to collapse a set of nodes into a named, reusable subgraph with a clean interface — essentially functions. This is how you fight spaghetti at scale.

Typed custom events with structured payloads would dramatically improve composability between assets. Right now events are relatively coarse.

Debugging tooling — step-through execution, node-level value inspection mid-run, execution history replay. The absence of this is probably the single biggest reason professional developers distrust visual systems.

A richer type system, particularly for structured data / records. Which connects directly to the next question.

Bidirectional LLM integration — natural language to graph generation, and graph-to-natural-language explanation. Given the trajectory of LLMs, this feels inevitable and would massively broaden accessibility.

Q7: Is the inability to express things like object destructuring an inherent restriction, or a type system problem?

It's fundamentally a type system problem, and one that most visual programming systems have simply chosen not to solve because the UX cost is high. In code, TypeScript gives you structural typing, inference, and destructuring for free because the representation is text and the compiler can reason over it statically. In a visual system, every socket has to have a concrete, renderable type at graph-construction time — you need to know the shape of a record to generate the output sockets for destructuring it.

This is solvable: you can implement a structural record type system where a node's output sockets are dynamically generated based on the schema of its input. But this introduces significant UX complexity — sockets that change shape, connections that can become invalid when a schema changes, and load-time validation that needs to handle schema evolution. The tools haven't caught up to the ambition. This isn't an inherent ceiling of behavior graphs; it's an unsolved tooling and UX design problem. The execution model supports it; the editors don't yet.

Q8: Could the runtime harness eventually be removed by compiling graphs to code?

Yes, absolutely, and this is a very natural direction. The behavior graph is already a well-structured IR (intermediate representation) — it's a DAG with typed sockets and explicit execution order. Compiling it to JavaScript, WASM, or any procedural target is straightforward for the purely synchronous, non-looping subset. Async nodes and loops require a little more scaffolding in the emitted code but nothing fundamental.

The runtime harness exists partly for the at-will execution security model — you want the host to control time slices and be able to interrupt infinite loops. A compiled version gives up that control. For trusted contexts (developer tooling, offline baking) compilation makes total sense and would yield significant performance gains. For untrusted asset execution in a browser, the runtime harness's sandboxing properties are genuinely valuable and harder to replicate in compiled form without essentially reimplementing a VM anyway.

Q9: Visual programming as a higher-level abstraction that compiles to any target language?

This is the most compelling long-term framing, and it's essentially the trajectory glTF interactivity is already on. A behavior graph stored in a glTF file is engine-agnostic: the same JSON representation should be executable in Babylon.js, Three.js, Mozilla Hubs, or a native game engine. The mapping from graph nodes to engine-specific APIs happens at the adapter layer.

Taking this further — compiling to JavaScript, C#, Lua, or WASM — would make behavior graphs function as a genuine lingua franca for logic, in the same way glTF is a lingua franca for geometry. The node vocabulary becomes a portability contract: if you support a node type, you're committing to a specific semantic, and any target language backend has to honor that semantic. This is ambitious but not conceptually different from what LLVM does for code — a well-specified IR that multiple backends can target. The hard part isn't the compilation; it's achieving sufficient consensus on node semantics across the industry to make the portability real. That's the standardization work, which is exactly what the Khronos glTF Interactivity working group is attempting.