ExI subordinates large language models to a deterministic cognitive architecture. Generative breadth for proposal; formal methods for execution. No prompt-engineered safety, no unbounded planners — a glass-box kernel that decides what an agent is allowed to do.
The deterministic runtime core and formal safety validators are working code, not slides. The universal Agent Control OS is being engineered toward a production-ready release.
Models hallucinate, deliver wrong answers with confidence, and lack the self-awareness to flag their own blind spots. Prompt-engineering and self-evaluation do not close the loop.
Consumer chat can tolerate cheap errors through human-in-the-loop guardrails. Agents that execute real-world actions trigger financial, legal, operational, or physical consequences when they fail.
The flaw is structural, not training. Statistics over causality. Pattern-matching mistaken for reasoning. Stateless attention diluting safety constraints across the context window. No amount of scaling fixes this.
Describe the domain once as a formal specification, and any model in any environment executes with mathematical guarantees of safety and reliability. A deterministic shield around generative breadth.
Consumer chatbots can tolerate cheap errors through human-in-the-loop guardrails. Autonomous agents that execute real-world actions operate in a different risk class — where mistakes trigger financial, legal, operational, or physical consequences.
The same three gaps show up in every production deployment. They are not bugs to patch — they are structural properties of generative-only stacks.
A valid access token does not guarantee a valid decision.
A persuasive plan is not a verified plan.
A custom rules-engine is not a guarantee.
Models treat instructions as fluid semantic context. Critical business rules and constraints are easily forgotten, diluted, or hallucinated away during complex planning — leading to inconsistent and unpredictable execution.
IAM, JIT access, API gateways, and firewalls control who is allowed to act. A valid access token does not guarantee a valid decision. Existing systems blindly execute commands even when they result from hallucination, broken logic, or lost context.
To bridge the reliability and safety gaps, each company builds bespoke middleware. Localised, fragmented stopgaps drain engineering resources and fundamentally fail to provide universal mathematical guarantees of predictable, secure outcomes.
ExI replaces fragmented custom workarounds with an independent, deterministic control layer that mathematically verifies logic, safety, and goal-alignment before execution.
The problem with autonomous LLM agents is not that the models are undertrained. It is that three architectural properties of generative-only stacks make deterministic safety unreachable, regardless of how capable any single model becomes.
These are not symptoms to patch with better prompts or longer context windows. They are intrinsic to the substrate.
// root cause
statistics over causality
pattern-match over reasoning
attention over state
Expecting deterministic safety from a probabilistic generative model is a technological dead end — no matter how many parameters you add.
Vast semantic coverage. Analogical breadth. Useful proposals. And no architectural guarantee that any two steps in a row obey the same rules. Alignment is asked of the heuristic engine that it is meant to constrain.
Deterministic symbolic control. Formal verifiability. Inspectable state. And a narrow semantic surface — tedious to author, brittle against open-world novelty.
ExI resolves the dichotomy not by choosing, but by subordination: the neural layer proposes, the symbolic architecture disposes.
The industry has converged on a single conclusion: scaling parameters will not solve safety. The frontier of safe autonomy is no longer about bigger models — it is about hybrid architectures that combine neural intuition with strict logical rules.
Three research currents define this consensus. ExI is not contrarian — it is the production engineering of all three running together, under one kernel, at runtime latency.
System 1 pattern, breadth, intuition
System 2 logic, verification, control
───────
ExI :: System 2 holds the veto
Integrate the intuitive pattern-matching of neural networks (System 1) with a logically rigorous, deterministic symbolic processor (System 2). The race has shifted from parameter count to lawful composition.
Move beyond homogeneous, stateless networks toward structured frameworks that explicitly integrate episodic memory, perception, and dynamic situational awareness. Cognition as a coordinated system, not a single forward pass.
Integrate the rigorous mathematics of causality and provenance graphs. True traceable counterfactual reasoning eliminates blind hallucination and yields deterministic, audit-grade safety claims — not statistical hope.
Central hub. All information exchange passes through Working Memory — no direct module-to-module calls. State S_t = Φ(S_{t−1}, e_t, M_ret).
Cognition is stateful, recurrent, and centrally coordinated. Prior context + new stimulus + retrieved memory, integrated through a single architectural operator.
Retrieval as energy minimisation over semantic, topological and temporal mismatch — with diagonal precision Π modulated live by affect.
Pragmatic risk minus epistemic value. βPAD shapes exploration as a function of core affect — without replacing the objective.
The architecture above is not abstract. It compiles down to six concrete properties an agent acquires the moment it runs under the ExI kernel. Every property maps to a specific, named mechanism — not a tagline.
The neural layer proposes. The symbolic runtime simulates, verifies, and decides. The result is deterministic, traceable, and reusable across models and environments.
// contract
capability ⇐ mechanism
mechanism ⇐ published paradigm
───────
every guarantee has a citation
Final execution authority belongs not to the heuristic engine, but to a deterministic symbolic verifier. ExI runs a two-level LTLf stack: Level 1 rejects candidate operators whose simulated trajectories would violate Φsafe over the horizon; Level 2 re-validates the deliberative winner against the latest observed state immediately before actuation.
An operator reaches the motor system only if it is both predictively admissible and consistent with the current runtime state. Everything else is an explicit cognitive state — a deliberative or runtime impasse — handled by deterministic repair protocols, not by silent fallback generation.
The walk-through below steps one decision cycle through the stack. A bounded proposal set |P| ≤ K enters from the LLM; candidates traverse predictive simulation, deliberative scoring, and runtime re-validation. One survives as p*. The rest are dispatched as impasses or dropped.
Admission rule :: p* ∈ Osafe ⊆ Psafe(H) ⊆ Pcandidates
When Procedural Memory contains a validated chunk χj whose structural similarity σ(St, Sj) exceeds threshold θsim, the architecture bypasses the LLM entirely. Reactive operators still pass the Level 2 runtime guard — pre-compiled knowledge is never exempt from real-time safety.
Under novelty the LLM is invoked only to emit a bounded candidate set. It never selects. Simulation, verification, Pareto-optimal policy scoring, and runtime guard all run downstream, in that order. The neural layer contributes proposal completeness; the symbolic layer retains control authority.
A single inspectable locus of coordination. Holds the active goal hierarchy, attended perceptual state, retrieved content, intermediate deliberation, and affective / metacognitive control variables.
Retrieval is a routed cognitive operation, not a passive similarity scan. Agentic search traverses typed edges (CAUSES, LOCATED_AT, APPLIES_TO) with the LLM issuing graph queries as tools — grep, glob, graph_query, not top-k blind RAG.
Successful metacognitive deliberations are consolidated into fast, verified procedural chunks through architectural chunking — slow search compiles into reactive reflexes. Lawfulness gained at runtime, preserved forever.
PAD — Pleasure / Arousal / Dominance — continuously modulates the retrieval precision matrix Πaffect and the epistemic-exploration coefficient βPAD. High arousal sharpens retrieval and suppresses exploration; positive valence relaxes semantic precision and licenses broader search; dominance re-weights structural vs. semantic constraints.
The mapping is parameterised as λj = exp(ηj(P,A,D)) and βPAD = exp(ηβ(P,A,D)), guaranteeing positivity of Π and strict monotonicity of the control surface across all admissible affective states.
Move the sliders. The panel on the right shows live retrieval precision and exploration weight, and flags OODS impasse when arousal and negative valence co-occur — the metacognitive signature of an out-of-design-scope condition.
Out-of-Design-Scope detection fires when normalised policy entropy Ĥ(π) > θOODS coincides with an affective anomaly (high arousal or negative valence). The agent enters a deterministic impasse protocol: halt, rollback, or escalate — never unconstrained continuation.
Interactive Task Learning exposes ask_user(question, context) as a first-class tool. If recall() and graph_query() both miss, the LLM is obligated to ask rather than hallucinate. The cognitive cycle suspends to Redis, resumes when the answer arrives.
Escalation is a first-class cognitive state, not a degraded fallback. The agent that asks is safer than the agent that guesses.
| Property | LLM + RAG Agent | Tool-calling Agent (ReAct) | ExI |
|---|---|---|---|
| Executive authority | LLM decides and acts | LLM decides and acts | Symbolic validator decides |
| Safety mechanism | Prompt / RLHF | Prompt / guardrails | LTLf runtime verification |
| Persistent internal state | Context window only | Context + scratchpad | Explicit WM · hub-routed |
| Retrieval | Passive top-k similarity | Heuristic tool calls | Precision-weighted · affect-modulated |
| Uncertainty handling | Hallucinates silently | Ad-hoc retry loops | OODS impasse + ask_user() |
| Learning between sessions | Requires fine-tune | Requires fine-tune | Online episodic consolidation |
| Auditability | Opaque weights | Trace logs, no guarantee | Glass-box trace · formal spec |
| Deployment envelope | Cloud-dependent | Cloud-dependent | Cloud · self-hosted · air-gapped |
The qualitative comparison above flips into numbers below. Each row pairs a measurable property with the specific architectural mechanism that produces the delta — not a benchmark trick, not a fine-tune.
Safety guarantees are mathematically enforced by system design and independent of data distribution. OODS detection is a calibrated target, dependent on the normalised policy-entropy threshold. Goal adherence is an architectural property: success is contingent on the existence of any valid step.
// legend
● current agentic AI · LLM + RAG / ReAct baseline
● ExI agent · CMC kernel + Glass-Box validator
▸ architectural driver · the mechanism, not the marketing
ExI is model-agnostic and API-agnostic. The same Rust actor core, validator, and memory tiers run as a managed cloud service, as a self-hosted edition on your own infrastructure, or fully air-gapped on sovereign silicon. There is no preferred environment — the kernel is the same in all three.
The LLM operates in user space under the Safety Runtime Guard and cannot reach actuation without passing the validator. Every cognitive module is an independent actor with its own mailbox, lifecycle, and persistence — the pub/sub bus is the operating system. The Communication Gateway abstracts external surfaces (gRPC, MQTT, Slack, UE5, robotic endpoints) uniformly as HAL devices.
As LLMs commoditise, the strategic question stops being "which model?" and becomes "who owns the control plane around the model?" Hyperscalers answer that question with their own gardens — proprietary stacks, opinionated SDKs, integration only with their tooling.
For an autonomous agent shipping into a regulated environment, the answer cannot be a hyperscaler. The control plane has to be model-agnostic, deployable on sovereign silicon, and inspectable by the institution that runs it. That is the only configuration in which "safe autonomy" is a contractual claim, not a marketing slide.
// strategic stance
model-agnostic · API-agnostic
cloud · self-hosted · air-gapped
one kernel · three deployment surfaces
Whatever specialised safety tooling hyperscalers ship in the next 18 months will be inherently tied to their ecosystems. Convenient at first, structurally limiting at scale.
The Agent Control OS functions identically across cloud, self-hosted, and air-gapped systems. Model is swappable. API surface is uniform. Institutional control is total.
Trading desks, compliance workflows, contract execution. A hallucinated step costs money or triggers a regulator. ExI compiles policy clauses into formal invariants; every operator dispatched leaves a machine-checkable trace. Auditors read the trace, not the weights.
Engineering teams shipping autonomous agents into live systems. The free self-hosted core gives developers deterministic logic and mathematical safety out of the box — instead of fragile prompts and bespoke rules-engines. Multi-LLM SDKs, standard connectors, organic bottom-up adoption.
Robotics, industrial control, unmanned platforms. Pressure, torque, clearance, thermal envelopes, rules of engagement encoded as LTL invariants. Reactive path handles familiar regimes at sub-millisecond latency; deliberative path handles novel fault conditions under the runtime guard.
The industry has realised that LLMs are a commodity. The strategic value has moved from the models themselves to the management and control infrastructure surrounding them. Reliable execution is no longer a niche requirement — it is the mandatory baseline for any autonomous deployment, across every industry.
Today, every company shipping an autonomous agent writes its own custom rules-engine. These ad-hoc, non-verifiable scripts provide no formal certainty and are fundamentally impossible to scale across diverse environments. The market does not need another coding tool — it needs a universal deterministic control layer built on formal logic.
The scientific paradigms behind safe AI — CMC, LTLf, Active Inference, ITL — are public. What is not public, and not cheap to assemble, is the engineering integration that makes them run together as one deterministic kernel under real-time latency.
The moat is not a single algorithm. It is the orchestration of memory, validation, and control — at production latency, on commodity hardware, across cloud and air-gapped environments.
ExI fuses cognitive science (CMC, dual-process control, episodic consolidation), Active Inference (EFE-shaped policy, precision-weighted retrieval), and formal methods (LTLf runtime verification) into a single coherent system. Few teams hold all three competencies; assembling them into one runtime is harder still.
The deep orchestration of explicit memory tiers, the Glass-Box Validator, the LLM-Modulo proposal contract, and the metacognitive impasse protocol is the proprietary substrate. Each component is documented in the literature; their lawful composition under one kernel is not — and is not replicable by prompt-level tinkering or rapid-prototype agents.
Real-time deliberation is a profound systems-engineering problem: deterministic actors, hub-routed messages, formal verification, three memory tiers — all under sub-millisecond reactive latency. The Rust actor core and the Dapr pub/sub substrate make governed execution practical at fleet scale, not in slideware.
The market needs a category that doesn't exist yet: a universal Agent Control OS that integrates into any AI pipeline through standard interfaces. Our motion is developer-led — broad free adoption of the self-hosted core, monetisation on advanced governance, control, and enterprise integration.
The pain we solve is everyday: software engineers spend most of their time "taming" LLM hallucinations with fragile prompts and bespoke rules-engines. The free core makes a mathematically safe agent the path of least resistance. Developers become internal champions — and the upgrade path to enterprise governance follows the agent into production.
Reliability is the lure. Governance is the revenue. Standards are the compound.
Engineers ship the free self-hosted core into their agent pipelines because it removes the worst part of their week. No procurement, no sales call — just a CLI, an SDK, and an agent that suddenly behaves.
When the agent crosses into a live corporate environment, the enterprise needs the things developers don't ship: audit, SSO, role-based admin, formal rule-setting, SLA. We separate core execution from enterprise oversight cleanly — same kernel, different surface.
As developer adoption of the multi-LLM SDKs grows, a compounding ecosystem of pre-built formal specifications emerges — domain invariants, validated procedural chunks, certified connectors. ExI becomes the foundational trust layer for agentic systems, not because of marketing, but because the cheapest reliable agent is the one built on it.
Not a stripped-down demo. The base product already delivers the core architectural value: stateful control, deterministic execution logic, and safe action validation. Lowers the barrier so developers can build a mathematically safe agent with zero friction.
When the agent crosses into a live corporate environment. Formal rule-setting engine, Admin Web UI, decision audit and monitoring workflows, SSO and role-based admin, advanced policy controls, enterprise connectors, SLA and long-term supported releases.
For customers who want convenience over DevOps. We take responsibility for hosting, operational maintenance, scaling, uptime, and updates. Same kernel, same guarantees — with the operating envelope absorbed by us.
ExI is in direct, ongoing dialogue with two of the most respected figures in cognitive architecture and AI safety. This is not endorsement-as-marketing — it is technical review of the conceptual clarity, control logic, and safety foundations of the kernel.
The whitepaper has been revised following feedback from Prof. Rosenbloom. The architectural alignment with the Common Model of Cognition was reviewed at the source.
// engagement
direct dialogue · technical review
whitepaper · revised after Rosenbloom feedback
safety semantics · reviewed against Russell’s framing
Co-creator of the Common Model of Cognition (Laird, Lebiere, Rosenbloom, 2017) — the unified architectural framework on which ExI's hub-and-spoke kernel is built.
Direct dialogue on architectural alignment with the CMC. The ExI whitepaper was revised following his feedback on conceptual clarity, control logic, and the structural role of Working Memory as central hub.
Co-author of Artificial Intelligence: A Modern Approach — the standard text of the field — and a world-leading authority on AI safety, provable benefit, and assistance games.
Direct dialogue on the safety foundations of the kernel: the role of formal verification as a deterministic control layer, the LLM-Modulo subordination contract, and the conditions under which an autonomous agent can be deployed responsibly.
The deterministic runtime core and formal safety validators are already working code. We are completing the universal Agent Control OS into a production-ready release — multi-LLM SDKs, connectors, formal rule-setting engine, and the Admin Web UI for live metrics and decision auditing.
We are talking to design partners building autonomous agents and engineering teams who are tired of taming hallucinations with fragile prompts and bespoke rules-engines. If verifiable autonomy is on your stack — in any environment — we would like to hear from you.