Energy-Based Cognition

WHY

A Local-First Cognitive Operating System for Personal Intelligence

Energy-Based Consensus for Time-Native Intelligence

June 2026

Abstract

Intelligence without history is imitation. Agency without consequence is theater. Current AI systems excel at spatial inference — embeddings, correlations, pattern completion — yet they remain episodic. Each session begins anew. Memory is database lookup. Errors carry no weight. Agents cannot form genuine relationships because they share no irreversible time. WHY proposes a cognitive operating system where personal intelligence is time-native by architecture. The user's machine becomes a node that indexes its digital world, converts activity into an event-grounded semantic graph, and coordinates agents that retrieve context, act under constraints, and leave cryptographic records. Memory is reconstructive, not retrieved. Identity is accumulated, not instantiated. Action is committed, not simulated. The temporal substrate extends energy-based models to distributed systems: agent events are validated by economic commitment, with invalid states penalized by capital destruction and valid states reinforced by network reward. The swarm minimizes energy over event histories to converge on shared, irreversible time. When many such nodes opt in, the web shifts from link retrieval to inference exchange: machines answer from public knowledge and permissioned private context without requiring raw personal data to leave its owner. The computer becomes autonomous because it finally has a memory of the world it is acting inside.

1

The Problem

1.1 The Missing Dimension

Human intelligence is fundamentally temporal. Identity emerges from accumulated experience. Memory reconstructs from the past. Decisions matter because they foreclose alternatives. Learning happens because mistakes leave traces.

Current AI has none of these properties. Each API call spawns a new, identical instance. Sessions end; agents dissolve. Memory is a database table: editable, truncatable, shared identically across all users. Errors have no lasting impact. Agents retry without penalty.

This is not a bug. It is an architectural omission: time has been treated as metadata rather than primitive.

1.2 The Four Absences

No persistent identity.

An agent is model weights plus a session identifier. It does not accumulate. It does not become.

No irreversible memory.

Observations can be edited, forgotten, or hallucinated. There is no signed commitment to what was seen.

No real consequences.

A bad action is corrected in the next prompt. There is no scar, no reputation, no learning from cost.

No shared reality.

Two agents can hold contradictory memories of the same event. There is no consensus on what happened.

1.3 The Spatial-Temporal Divide

Modern AI has mastered spatial intelligence: embeddings, semantic similarity, attention over context, and statistical completion. It has neglected temporal intelligence: ordered experience, persistent identity, causal reasoning, and consequence-aware action.

Intelligence without irreversible history is imitation. Agency without persistent identity is theater.

2

The Model

2.1 The Personal Node

Let P denote persons, N nodes, E events, G semantic graphs, A agents, T tasks, C contexts, M models, and R action records.

A personal node n ∈ N is a local-first runtime controlled by p ∈ P. It stores event history, graph state, policies, agent permissions, model configuration, and action records.

Raw state remains local by default. Computed slices may synchronize under policy.

2.2 The Personal Event

An event e ∈ E is a typed observation:

e = (id, source, type, time, payloadHash, parent, entities, policy, sig)

id

Collision-resistant hash of canonical content.

source

Provenance: application, sensor, tool, or user assertion.

type

OBSERVE, THINK, ACT, or COMMIT.

time

Logical clock, augmented by wall-clock when available.

payloadHash

Commitment to content while content remains local.

parent

Hash of prior event in a tamper-evident chain.

entities

Extracted people, projects, artifacts, decisions, and goals.

policy and sig

Retention, retrieval, disclosure rules, and cryptographic signature.

2.3 The Semantic Graph

The semantic graph at time t is defined as:

G_t = (V_t, E_t, W_t)

Vertices include people, projects, artifacts, tasks, decisions, places, media, goals, and concepts. Edges include authored, mentioned, depends-on, scheduled, purchased, approved, contradicted, and caused.

w_ij(t) = σ(θ₁·sim(v_i, v_j) + θ₂·cooccur(v_i, v_j)
          + θ₃·recency_ij + θ₄·feedback_ij + θ₅·causal_ij)

The graph preserves the difference between semantic similarity, temporal proximity, causal dependence, and user-confirmed importance.

2.4 Memory as Reconstruction

Memory is not table lookup. It is attention-weighted reconstruction:

Memory(n, q, t) = Σ α_i(q,t)·Embed(e_i) + Σ β_j(q,t)·Embed(v_j)

The weights α_i and β_j are conditioned on query, recency, salience, source reliability, user feedback, and policy.

2.5 The Energy-Based Extension

The critical innovation is extending energy-based models to distributed agent consensus. Define the energy of event e proposed by node n:

E_n(e) = α·Commitment(n) + β·PolicyViolation(e, Π) + γ·HistoryConflict(e, E_{≤t})

Commitment is the economic bond locked by the node, with higher commitment lowering base energy. Policy violation is zero when e ∈ Π and infinite when e ∉ Π. History conflict measures divergence from committed history.

A valid event is a low-energy state. An invalid event is high-energy and rejected.

Consensus as Energy Minimization

The swarm minimizes Σ E_n(e) over all proposed events. Validators converge on the event history with globally minimal energy that satisfies all local constraints.

Economic Contrastive Learning

In the positive phase, validators who commit correct events receive network reward, lowering energy for valid behavior. In the negative phase, validators who commit incorrect events lose committed capital, raising energy for invalid behavior.

At equilibrium, the network anneals toward a low-energy manifold where honest agents are inexpensive and dishonest agents are prohibitively costly.

Theorem 2.1: Energy Minimization Guarantees Consensus

If E_n(e) is convex in commitment and monotonic in policy violation, then the swarm consensus protocol converges to a unique low-energy state with probability 1 as t → ∞.

Proof sketch. By construction, valid events lower network energy. Invalid events raise energy via capital destruction. The process is Lyapunov descent on the energy landscape. Convergence follows from convexity and the Borel-Cantelli lemma on penalization events. ∎

2.6 JEPA-Compatible World Models

WHY's personal world model implements a simplified Joint Embedding Predictive Architecture. Latent state z_t encodes graph structure, recent events, user preferences, and active goals.

The predictive function F_θ estimates future state:

ẑ_{t+1} = F_θ(z_t, a_t)

A prediction head estimates observations, costs, approvals, and downstream consequences:

(ô_{t+1}, ĉ_{t+1}, û_{t+1}, r̂_{t+1}) = H_θ(ẑ_{t+1})

Training is self-supervised: predict next graph state from current observations. Unlike language models that predict text, WHY predicts state transitions.

3

The Mechanism

3.1 Four-Layer Architecture

Layer 4

Inference Network

Computed slices across opted-in nodes.

Layer 3

Autonomous Agents

Retrieve, plan, act, and record under policy.

Layer 2

Semantic Graph

People, projects, artifacts, decisions, and intent.

Layer 1

Local OS Substrate

Files, apps, conversations, media, and activity.

3.2 Layer 1: Local OS Substrate

The substrate indexes the user's digital world on-device: files, conversations, notes, photos, browsing, calendar, workflows, applications, and activity. It supports embeddings, metadata, provenance, permissions, and revocation.

3.3 Layer 2: Semantic Graph

The graph connects people, projects, decisions, assets, deadlines, goals, and constraints.

A document is not merely a file. It may be part of a project, authored by a collaborator, attached to a decision, contradicted by a later note, or relevant to a future meeting.

3.4 Layer 3: Autonomous Agents

Agents retrieve context, coordinate tools, evaluate outcomes, and operate asynchronously. They are not unconstrained scripts. Each action is governed by policy, risk class, tool permission, and user approval.

A goal is the unit of computing:

g = (intent, constraints, deadline, risk, approvalPolicy)

A plan is an executable structure:

π = (steps, tools, requiredContext, approvals, rollback)

An action x executes only if policy permits it at time t:

Allowed(x, g, Π, t) = 1

Reversible first. For local actions over user-owned state, WHY prefers reversible staging before irreversible mutation. Staged file movement is preferred to immediate deletion. A draft response is preferred to a sent email. A proposed purchase is preferred to an automatic purchase unless policy explicitly permits otherwise.

3.5 Action Records

Every meaningful action leaves a record:

r = (goal, contextRefs, toolCalls, outputs, approvals, effects, time, hash)

Records make autonomy auditable. Accepted actions reinforce preferences. Rejected actions become negative evidence. Mistakes requiring correction become future caution.

3.6 Layer 4: Inference Network

When nodes opt in, the web shifts from link retrieval to inference exchange. A query is answered by public sources, the local graph, and opted-in peer nodes with relevant private context.

A computed slice is defined as:

s = (query, answer, citations, confidence, policy, proof, expiry)

Raw data need not leave the node. A slice may include a summary, embedding, score, retrieval result, or artifact. Commitments to source events allow verification without exposure.

3.7 The Energy Layer

The inference network requires consensus on shared history. Distributed ledger technology provides ordering, irreversibility, consensus, and proof.

Ordering links event sequence: observation to thought to action. Irreversibility prevents silent revision of past decisions. Consensus allows agents to agree on shared history. Proof ensures that an agent's action record can be verified.

The ledger is not the product. It is the minimal substrate for multi-agent temporal coordination. The energy function makes it economically secure.

4

The Network

4.1 Local Sovereignty

The most valuable dataset describes a person's digital world. Ownership must be architectural, not rhetorical. Local-first keeps raw context near the user. Computed slices expose only under policy. The person remains owner.

4.2 The Second Half of Human Knowledge

The public web made published information searchable. It did not make private context computable.

The second half lives on hard drives, phones, message histories, notes, photos, spreadsheets, browsing trails, calendar records, code repositories, and local application state.

The next web is an inference network across sovereign nodes. A node answers from local context by exposing a computed slice. The raw file need not move. The person remains owner. The network gains intelligence without extraction.

4.3 Three Constraints

4.4 The Shift

Search organized the public web around links. Feeds organized attention around platforms. WHY organizes personal intelligence around time, context, and action.

5

Conclusion

The next computing layer is not a better model. It is a better substrate for models to live inside.

Personal intelligence requires a machine that remembers what matters, grounds inference in the user's world, acts under constraint, and improves through time. Without such a substrate, even powerful models remain visitors: brilliant for a moment, gone by morning.

WHY proposes the missing operating layer. Memory is event-grounded, not conversationally accidental. Context is a graph, not a pile of files. Autonomy is policy-bound, not ambient. Networked intelligence is opt-in and computed, not extractive. The user's machine is a node in an inference network without requiring the user's raw life to become someone else's product.

The computer becomes autonomous because it finally has a memory of the world it is acting inside, and an energy-based consensus that makes that memory irreversible.

References

References

  1. Ashish Vaswani et al. Attention Is All You Need. NeurIPS, 2017.
  2. Yann LeCun. A Path Towards Autonomous Machine Intelligence. Open review, 2022.
  3. Leslie Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. CACM, 1978.
  4. Endel Tulving. Episodic and Semantic Memory. Organization of Memory, 1972.
  5. John Locke. An Essay Concerning Human Understanding. 1689.
  6. Satoshi Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008.
  7. Andy Clark and David Chalmers. The Extended Mind. Analysis, 1998.
  8. Yann LeCun. Deep Learning for AI. Communications of the ACM, 2021.