Essay

Attention–Compression Framework

A Substrate-Independent Model of Attention, Curiosity, and Reality Formation

· Bobby Simpson
attentioncompressioncuriosityinterestingnessartifactsattractorsmeaningpowerconsentethicsreality-formationframeworksubstrate-independent

0) Purpose of This Framework

This framework unifies:

  • Attention mechanics (pointing, artifacts, decay)
  • Curiosity / interestingness (compression improvement)
  • Reality formation (fossilized attention)

into a single, operational model.

It is intended to be:

  • Conceptually tight
  • Mechanically interpretable
  • Applicable across cognition, culture, organizations, and systems

No game substrate is assumed.


1) Core Substrate: Compression

1.1 Data and Models

Let:

  • D = data stream (sensory, social, symbolic, environmental)
  • O(t) = the system’s internal model at time t
  • C(D, O) = compression cost of encoding D with model O

Lower C = better model.


1.2 Curiosity Reward

Curiosity reward is defined as:

r(t) = C(D, O(t-1)) − C(D, O(t))

Interpretation:

  • Reward is generated by model improvement
  • Not by truth, utility, or beauty directly
  • But by reduced description length

1.3 Interestingness

Interestingness is the rate of compression improvement:

I(D, O(t)) ∝ ∂B(D, O(t)) / ∂t

Where:

  • Beauty B ≈ compression quality
  • Interestingness I ≈ learning gradient

Edge cases:

  • Perfect randomness → no compression → not interesting
  • Perfect predictability → no improvement → not interesting

2) Attention (Redefined Precisely)

2.1 Definition

Attention is the allocation of finite compression capacity over time.

Attention determines:

  • What data is modeled
  • Which models are updated
  • Where curiosity reward can arise

2.2 Properties of Attention

  • Finite
  • Directed
  • Temporally extended
  • Subject to opportunity cost

Allocating attention to one process necessarily deprives others.


3) Pointing as a Primitive Act

3.1 Pointing

Pointing is any act that declares:

“Allocate compression effort here.”

Forms:

  • Naming
  • Measuring
  • Labeling
  • Recording
  • Repeated noticing

Pointing is irreversible in principle.


3.2 Imaginary Artifacts

Pointing creates an Imaginary Artifact (IA).

An IA is:

  • A discrete modeling target
  • Non‑material but causally real
  • Capable of accumulating attention
  • Subject to decay

Examples:

  • An idea
  • A plan
  • A role
  • A fear
  • A hypothesis

4) Artifact Dynamics

4.1 Attention Accumulation

Artifacts accumulate attention when:

  • They continue to generate curiosity reward
  • They remain promising sites of compression improvement

Formally:

  • Positive expected r(t) sustains attention

4.2 Decay and Boredom

When:

  • Compression improvement stalls
  • Expected future reward approaches zero

Attention decays.

Boredom = zero compression gradient.

Imaginary artifacts decay faster than real ones.


5) Thresholds: Imaginary → Real

5.1 Fossilization

When an imaginary artifact accumulates sufficient total attention:

  • The compression work becomes amortized
  • Ongoing maintenance cost drops
  • The artifact instantiates as a Real Artifact

Examples:

  • Idea → project
  • Repeated action → habit
  • Hypothesis → theory
  • Norm → institution

5.2 Partial Realization

Realization may be:

  • Incremental
  • Staged
  • Reversible

Small realized artifacts feed attention back into the parent IA.


6) Real Artifacts as Cached Compression

Real artifacts are:

  • Cached models
  • Compiled structure
  • Fossilized attention

They:

  • Persist with lower marginal attention
  • Shape future attention flows
  • Bias what is seen as interesting

Examples:

  • Language
  • Tools
  • Infrastructure
  • Bureaucracy

7) Attractors

7.1 Definition

Attractors are regions of expected future compression gain.

They are:

  • Field‑like
  • Non‑discrete
  • Named after the fact

Examples:

  • “Progress”
  • “Safety”
  • “Truth”
  • “Success”

7.2 Relationship to Attention

Attention naturally flows toward attractors unless constrained.

Constraint mechanisms:

  • Fear
  • Incentives
  • Authority
  • Scarcity

8) Leakage, Coupling, and Composition

8.1 Leakage

Attention leaks between artifacts that:

  • Share representational structure
  • Co‑compress efficiently

This produces:

  • Fame compounding
  • Institutional lock‑in
  • Paradigm coherence

8.2 Composition

  • Multiple IAs can merge
  • Shared attractors accelerate convergence

This enables:

  • Collective belief
  • Social movements
  • Cultural norms

9) Conservation and Pathology

9.1 Conservation Law

Attention is conserved at the system level.

Allocating attention to:

  • Maintaining existing artifacts
  • Filtering accumulated structure

Reduces capacity for:

  • Exploration
  • Novel model formation

9.2 Pathologies

Misaligned compression produces:

  • Addiction (short‑term reward, no long‑term compression)
  • Ideology (over‑compressed models defended at all cost)
  • Burnout (maintenance exceeds curiosity)
  • Stagnation (no accessible gradients)

10) Awe, Surprise, and Phase Transitions

This section extends the framework beyond curiosity/interestingness to include awe and related affective signals, while remaining mathematically compatible with:

  • curiosity reward: r(t) = C(D,O(t−1)) − C(D,O(t))
  • interestingness: I ∝ d(−C)/dt

10.1 Auxiliary quantities

Let observations be x_t and compression cost C_t := C(D,O(t)).

Surprisal / surprise (instant encoding cost under the previous model):

S_t := −log p_{O(t−1)}(x_t)

Learning progress (curiosity reward):

r_t := C_{t−1} − C_t

Expected learning progress over horizon k:

E[r_{t:t+k}] := E[C_t − C_{t+k}]


10.2 Boredom, confusion, and relief (quick definitions)

These are derived signals (not new primitives):

  • Boredom: low expected learning progress. Boredom_t ∝ −E[r_{t:t+k}]

  • Confusion: high current cost with low expected progress. Confusion_t ∝ C_t · (1 − σ(E[r_{t:t+k}]))

  • Relief: sharp drop in cost (a compression win). Relief_t ∝ max(0, r_t)

(σ is any monotone squashing function.)


10.3 Awe (operational definition)

Awe is not merely high interestingness. It is a phase shift in modeling.

Awe tends to occur when:

  • surprise is high (S_t high)
  • but the experience is sensed as deeply learnable (E[r] high)
  • and successful compression likely requires a model-class shift (a new representational basis)

Define a model revision cost d(O(t),O(t−1)) and an indicator for “model-class shift required”:

P_shift(t) := P( O* lies in an expanded hypothesis class H_expanded )

Then a usable scalar proxy is:

Awe_t ∝ S_t · E[r_{t:t+k}] · P_shift(t)

Interpretation:

  • S_t captures vastness/violation
  • E[r] captures promise of future compression
  • P_shift captures that the needed move is not incremental

10.4 Awe as re-ontology

Awe is the felt recognition:

“There is a much better compression available, but my current representational basis cannot reach it by small updates.”

Formally:

C(D,O(t)) is high, ∃ O’ in H_expanded such that C(D,O’) ≪ C(D,O(t)), but O’ is not reachable by small d(O,O’).


10.5 Phase transitions: interest → awe → beauty

A common trajectory:

  1. Interest: E[r] > 0, incremental improvement
  2. Awe: S high, E[r] high, P_shift high (representational rupture)
  3. Refit: high revision cost, temporary instability
  4. Beauty: low C, stable compression

This explains why awe can feel disorienting before it becomes satisfying.


11) Appreciation (Active Steering)

Appreciation is deliberate gradient steering.

It is the practice of:

  • Seeing what is
  • Choosing which attractors to feed
  • Allowing low‑reward artifacts to decay

Appreciation is not denial. It is selective allocation of compression effort.


12) Love, Grief, Trust, and Meaning (Compression-Coupling Phenomena)

This section extends the framework to core relational and existential experiences, expressed using the same compression-compatible quantities.

12.1 Trust

Trust is the willingness to offload compression work to another system.

Formally, agent A trusts agent B when:

E[C_A(D | O_B)] < E[C_A(D | O_A)]

That is, A expects B’s model to compress A’s future experience more efficiently than A’s own.

Trust reduces:

  • modeling effort
  • uncertainty
  • attentional load

Trust fails when:

  • compression delegated to B increases cost or variance

12.2 Love

Love is sustained, reciprocal compression coupling.

Two agents A and B are in love when:

  • each becomes a high-leverage compression node for the other
  • mutual modeling reduces long-term cost despite short-term surprises

A minimal expression:

Love(A,B) ∝ ∫ ( r_A←B(t) + r_B←A(t) ) dt

Where r_A←B is learning progress about B by A, and vice versa.

Love feels safe because:

  • compression is efficient
  • prediction errors are rapidly amortized
  • model updates are mutually permitted

12.3 Grief

Grief is forced recompression after the sudden loss of a high-leverage compression node.

If agent B was a major contributor to A’s compression:

ΔC_A ≫ 0 when B is removed

Grief magnitude scales with:

  • how much of the world B helped compress
  • how irreplaceable that compression was

Grief persists until:

  • alternative models amortize the lost compression

12.4 Meaning

Meaning is compression leverage.

An artifact, relationship, symbol, or idea is meaningful to the extent that:

small description → large experiential compression

Formally:

Meaning(X) ∝ rac{bits of experience compressed}{bits required to represent X}

This explains why:

  • symbols outweigh details
  • rituals persist
  • simple stories dominate complex truths

Meaning collapses when:

  • leverage decays
  • symbols no longer compress lived experience

13) Power (Constraint Over Compression)

This section defines power as a first-class system property, fully compatible with the attention–compression formalism.

13.1 Definition

Power is the capacity to shape, constrain, or redirect the compression paths of other systems.

An agent A has power over agent B to the extent that A can:

  • determine what B is allowed to attend to
  • restrict which models B may form or update
  • impose pre-compressed narratives on B’s experience

13.2 Mechanisms of Power

Power operates through compression control, including:

  1. Attention gating — limiting what data enters B’s model (censorship, surveillance, distraction)

  2. Narrative pre-compression — supplying ready-made models (propaganda, ideology, branding)

  3. Update penalties — increasing the cost of revising models (punishment, social sanction, threat)

  4. Gradient starvation — preventing access to curiosity reward (monotony, overwork, chaos)


13.3 Power vs Trust

  • Trust lowers compression cost voluntarily
  • Power lowers apparent cost by removing alternatives

A system under power may experience apparent order without genuine compression improvement.

This explains why power often feels stabilizing in the short term but brittle over time.


13.4 Coercion and Harm

Coercion occurs when model updates are forced without consent.

Formally:

Forced update ⇒ d(O_B(t), O_B(t−1)) imposed externally

This creates:

  • high compression cost
  • loss of agency
  • long-term instability

Harm corresponds to non-consensual compression work.


13.5 Legibility and Over-Compression

Making a system legible to authority often requires:

reducing rich local structure → simplified global model

This lowers compression cost for the authority but raises it for the system itself.

Over-compression destroys:

  • resilience
  • adaptability
  • local meaning

13.6 Power Dynamics and Collapse

Powerful systems fail when:

  • maintained compression diverges too far from lived data
  • curiosity gradients are suppressed too long
  • forced models accumulate unresolved error

Collapse is delayed recompression.


Consent is treated as a mechanical boundary condition on model updating and coupling.

14.1 Definition

Consent is a mutually acknowledged permission structure for compression and model update.

Agent A has consent with agent B when updates to B’s model caused by A are:

  • expected (within agreed bounds)
  • revocable
  • renegotiable
  • non-punitive to refuse

14.2 Consensual vs non-consensual update

Let ΔO_B(t) := d(O_B(t), O_B(t−1)) be B’s model revision magnitude.

  • Consensual update: B opts into ΔO_B(t)
  • Non-consensual update: ΔO_B(t) is imposed

A key distinction is not whether B updates, but whether B retains agency over update.


Consent alters the effective revision cost.

A simple expression:

C_B,total = C_B,data + μ · ΔO_B − κ · Consent(B,A)

Where Consent(B,A) ∈ [0,1] reduces perceived/experienced cost of revision.

This captures:

  • why the same surprise can feel thrilling (consensual) or traumatic (non-consensual)
  • why trust accelerates learning

In real systems, consent is represented by artifacts such as:

  • explicit agreements
  • norms
  • safe words / stop mechanisms
  • boundaries and enforcement
  • reversible commitments

These are consent artifacts: cached structures that keep coupling safe.


14.5 Breach

A breach occurs when an interaction crosses agreed bounds.

Mechanically:

  • breach increases μ (revision cost)
  • decreases Consent(B,A)
  • increases variance of future costs

This pushes the system from trust-dynamics toward power-dynamics.


15) Ethics and Morality (Compression Heuristics Under Coupling)

Ethics is modeled here as rule-like compression for social coordination under finite attention.

15.1 Why morality exists (mechanically)

Social life is high-dimensional. Moral rules are:

  • low-description heuristics
  • that compress expected outcomes across many contexts

They reduce:

  • decision cost
  • negotiation overhead
  • model uncertainty

15.2 Heuristic validity and domain

A moral rule R is useful when:

E[C_society | follow R] < E[C_society | no rule]

But every heuristic has a domain; outside-domain use creates error.

Moral conflict often signals:

  • domain mismatch
  • competing compressions
  • unmodeled externalities

15.3 Harm principle (compression version)

A compact ethical primitive compatible with this framework:

Harm is imposed, non-consensual compression work that increases another system’s long-run cost.

Formally (schematic):

Harm(A→B) ∝ E[C_B,future | A] − E[C_B,future | ¬A]

with the additional condition that Consent(B,A) is low.


15.4 Justice as cost distribution

Justice concerns how compression costs and benefits are distributed.

  • Exploitation: one system externalizes its compression costs onto others
  • Fairness: costs are shared proportionally to benefits and agency

A toy measure:

Exploitation(A,B) ∝ (Cost imposed on B by A) − (Benefits returned to B)


15.5 Virtues as stable policies

Virtues can be treated as stable attention-allocation policies that:

  • reduce harm risk
  • preserve consent
  • keep gradients accessible

Examples (mechanically framed):

  • honesty: reduces model divergence and hidden error
  • humility: lowers revision resistance; keeps H_expanded reachable
  • compassion: allocates attention to others’ cost surfaces

15.6 The ethics–power interface

Ethical breakdown is strongly predicted by:

  • high power asymmetry
  • low consent artifacts
  • high imposed revision cost

Ethics without consent collapses into compliance.


16) System Summary (Extended)

  • Attention allocates compression effort
  • Pointing creates modeling targets
  • Interestingness is compression improvement
  • Awe signals the need for new representational bases
  • Artifacts are cached compression
  • Love and trust are shared compression strategies
  • Grief is forced recompression after loss
  • Meaning is compression leverage
  • Power constrains compression paths
  • Consent is the boundary condition that keeps coupling safe
  • Ethics is compression heuristics for coordination under coupling

Or compactly:

Reality is attention, compressed and slowed.