No persistent identity.
An agent is model weights plus a session identifier. It does not accumulate. It does not become.
Energy-Based Cognition
A Local-First Cognitive Operating System for Personal Intelligence
Energy-Based Consensus for Time-Native Intelligence
June 2026
Abstract
Intelligence without history is imitation. Agency without consequence is theater. Current AI systems excel at spatial inference — embeddings, correlations, pattern completion — yet they remain episodic. Each session begins anew. Memory is database lookup. Errors carry no weight. Agents cannot form genuine relationships because they share no irreversible time. WHY proposes a cognitive operating system where personal intelligence is time-native by architecture. The user's machine becomes a node that indexes its digital world, converts activity into an event-grounded semantic graph, and coordinates agents that retrieve context, act under constraints, and leave cryptographic records. Memory is reconstructive, not retrieved. Identity is accumulated, not instantiated. Action is committed, not simulated. The temporal substrate extends energy-based models to distributed systems: agent events are validated by economic commitment, with invalid states penalized by capital destruction and valid states reinforced by network reward. The swarm minimizes energy over event histories to converge on shared, irreversible time. When many such nodes opt in, the web shifts from link retrieval to inference exchange: machines answer from public knowledge and permissioned private context without requiring raw personal data to leave its owner. The computer becomes autonomous because it finally has a memory of the world it is acting inside.
Human intelligence is fundamentally temporal. Identity emerges from accumulated experience. Memory reconstructs from the past. Decisions matter because they foreclose alternatives. Learning happens because mistakes leave traces.
Current AI has none of these properties. Each API call spawns a new, identical instance. Sessions end; agents dissolve. Memory is a database table: editable, truncatable, shared identically across all users. Errors have no lasting impact. Agents retry without penalty.
This is not a bug. It is an architectural omission: time has been treated as metadata rather than primitive.
An agent is model weights plus a session identifier. It does not accumulate. It does not become.
Observations can be edited, forgotten, or hallucinated. There is no signed commitment to what was seen.
A bad action is corrected in the next prompt. There is no scar, no reputation, no learning from cost.
Two agents can hold contradictory memories of the same event. There is no consensus on what happened.
Modern AI has mastered spatial intelligence: embeddings, semantic similarity, attention over context, and statistical completion. It has neglected temporal intelligence: ordered experience, persistent identity, causal reasoning, and consequence-aware action.
Intelligence without irreversible history is imitation. Agency without persistent identity is theater.
Let P denote persons, N nodes, E events, G semantic graphs, A agents, T tasks, C contexts, M models, and R action records.
A personal node n ∈ N is a local-first runtime controlled by p ∈ P. It stores event history, graph state, policies, agent permissions, model configuration, and action records.
Raw state remains local by default. Computed slices may synchronize under policy.
An event e ∈ E is a typed observation:
e = (id, source, type, time, payloadHash, parent, entities, policy, sig)
Collision-resistant hash of canonical content.
Provenance: application, sensor, tool, or user assertion.
OBSERVE, THINK, ACT, or COMMIT.
Logical clock, augmented by wall-clock when available.
Commitment to content while content remains local.
Hash of prior event in a tamper-evident chain.
Extracted people, projects, artifacts, decisions, and goals.
Retention, retrieval, disclosure rules, and cryptographic signature.
The semantic graph at time t is defined as:
G_t = (V_t, E_t, W_t)
Vertices include people, projects, artifacts, tasks, decisions, places, media, goals, and concepts. Edges include authored, mentioned, depends-on, scheduled, purchased, approved, contradicted, and caused.
w_ij(t) = σ(θ₁·sim(v_i, v_j) + θ₂·cooccur(v_i, v_j)
+ θ₃·recency_ij + θ₄·feedback_ij + θ₅·causal_ij)
The graph preserves the difference between semantic similarity, temporal proximity, causal dependence, and user-confirmed importance.
Memory is not table lookup. It is attention-weighted reconstruction:
Memory(n, q, t) = Σ α_i(q,t)·Embed(e_i) + Σ β_j(q,t)·Embed(v_j)
The weights α_i and β_j are conditioned on query, recency, salience, source reliability, user feedback, and policy.
The critical innovation is extending energy-based models to distributed agent consensus. Define the energy of event e proposed by node n:
E_n(e) = α·Commitment(n) + β·PolicyViolation(e, Π) + γ·HistoryConflict(e, E_{≤t})
Commitment is the economic bond locked by the node, with higher commitment lowering base energy. Policy violation is zero when e ∈ Π and infinite when e ∉ Π. History conflict measures divergence from committed history.
A valid event is a low-energy state. An invalid event is high-energy and rejected.
Consensus as Energy Minimization
The swarm minimizes Σ E_n(e) over all proposed events. Validators converge on the event history with globally minimal energy that satisfies all local constraints.
Economic Contrastive Learning
In the positive phase, validators who commit correct events receive network reward, lowering energy for valid behavior. In the negative phase, validators who commit incorrect events lose committed capital, raising energy for invalid behavior.
At equilibrium, the network anneals toward a low-energy manifold where honest agents are inexpensive and dishonest agents are prohibitively costly.
Theorem 2.1: Energy Minimization Guarantees Consensus
If E_n(e) is convex in commitment and monotonic in policy violation, then the swarm consensus protocol converges to a unique low-energy state with probability 1 as t → ∞.
Proof sketch. By construction, valid events lower network energy. Invalid events raise energy via capital destruction. The process is Lyapunov descent on the energy landscape. Convergence follows from convexity and the Borel-Cantelli lemma on penalization events. ∎
WHY's personal world model implements a simplified Joint Embedding Predictive Architecture. Latent state z_t encodes graph structure, recent events, user preferences, and active goals.
The predictive function F_θ estimates future state:
ẑ_{t+1} = F_θ(z_t, a_t)
A prediction head estimates observations, costs, approvals, and downstream consequences:
(ô_{t+1}, ĉ_{t+1}, û_{t+1}, r̂_{t+1}) = H_θ(ẑ_{t+1})
Training is self-supervised: predict next graph state from current observations. Unlike language models that predict text, WHY predicts state transitions.
Inference Network
Computed slices across opted-in nodes.
Autonomous Agents
Retrieve, plan, act, and record under policy.
Semantic Graph
People, projects, artifacts, decisions, and intent.
Local OS Substrate
Files, apps, conversations, media, and activity.
The substrate indexes the user's digital world on-device: files, conversations, notes, photos, browsing, calendar, workflows, applications, and activity. It supports embeddings, metadata, provenance, permissions, and revocation.
The graph connects people, projects, decisions, assets, deadlines, goals, and constraints.
A document is not merely a file. It may be part of a project, authored by a collaborator, attached to a decision, contradicted by a later note, or relevant to a future meeting.
Agents retrieve context, coordinate tools, evaluate outcomes, and operate asynchronously. They are not unconstrained scripts. Each action is governed by policy, risk class, tool permission, and user approval.
A goal is the unit of computing:
g = (intent, constraints, deadline, risk, approvalPolicy)
A plan is an executable structure:
π = (steps, tools, requiredContext, approvals, rollback)
An action x executes only if policy permits it at time t:
Allowed(x, g, Π, t) = 1
Reversible first. For local actions over user-owned state, WHY prefers reversible staging before irreversible mutation. Staged file movement is preferred to immediate deletion. A draft response is preferred to a sent email. A proposed purchase is preferred to an automatic purchase unless policy explicitly permits otherwise.
Every meaningful action leaves a record:
r = (goal, contextRefs, toolCalls, outputs, approvals, effects, time, hash)
Records make autonomy auditable. Accepted actions reinforce preferences. Rejected actions become negative evidence. Mistakes requiring correction become future caution.
When nodes opt in, the web shifts from link retrieval to inference exchange. A query is answered by public sources, the local graph, and opted-in peer nodes with relevant private context.
A computed slice is defined as:
s = (query, answer, citations, confidence, policy, proof, expiry)
Raw data need not leave the node. A slice may include a summary, embedding, score, retrieval result, or artifact. Commitments to source events allow verification without exposure.
The inference network requires consensus on shared history. Distributed ledger technology provides ordering, irreversibility, consensus, and proof.
Ordering links event sequence: observation to thought to action. Irreversibility prevents silent revision of past decisions. Consensus allows agents to agree on shared history. Proof ensures that an agent's action record can be verified.
The ledger is not the product. It is the minimal substrate for multi-agent temporal coordination. The energy function makes it economically secure.
The most valuable dataset describes a person's digital world. Ownership must be architectural, not rhetorical. Local-first keeps raw context near the user. Computed slices expose only under policy. The person remains owner.
The public web made published information searchable. It did not make private context computable.
The second half lives on hard drives, phones, message histories, notes, photos, spreadsheets, browsing trails, calendar records, code repositories, and local application state.
The next web is an inference network across sovereign nodes. A node answers from local context by exposing a computed slice. The raw file need not move. The person remains owner. The network gains intelligence without extraction.
Search organized the public web around links. Feeds organized attention around platforms. WHY organizes personal intelligence around time, context, and action.
The next computing layer is not a better model. It is a better substrate for models to live inside.
Personal intelligence requires a machine that remembers what matters, grounds inference in the user's world, acts under constraint, and improves through time. Without such a substrate, even powerful models remain visitors: brilliant for a moment, gone by morning.
WHY proposes the missing operating layer. Memory is event-grounded, not conversationally accidental. Context is a graph, not a pile of files. Autonomy is policy-bound, not ambient. Networked intelligence is opt-in and computed, not extractive. The user's machine is a node in an inference network without requiring the user's raw life to become someone else's product.
The computer becomes autonomous because it finally has a memory of the world it is acting inside, and an energy-based consensus that makes that memory irreversible.
Contact