Structural Inevitability as a Response-Level Criterion for AGI Evaluation
Preprint · Citium · 2026
Abstract
This paper develops structural inevitability as a local criterion for AGI evaluation at the problem-response level. The unit is not a system label in isolation. It is a well-formed problem under a fixed representation, a declared semantic equivalence relation, and an admissible response-class space. A response satisfies the criterion when it lands in the class the completed specification makes uniquely cheapest by a margin that remains meaningful after approximation error. The result is a criterion for response-level success, not a complete theory of AGI.
The account has three layers. The ideal layer uses exact Kolmogorov complexity as a limiting analytical object. The computable layer uses Charter Protocol to compile a raw problem into a charter-enriched specification with candidate classes, structural surrogates, verifiers, routing, and escalation. The empirical layer tests proxy margins against correctness, invariant satisfaction, false acceptance, and inadequacy behavior.
Three subsidiary results are stated under indexed assumptions: posterior concentration for an explicit observer model, invariant-constrained completion for value-laden specifications, and finite coverage for charter authoring over reachable, charter-realizable families. Every guarantee is tied to well-formedness, margin, observer model, verifier adequacy, proxy error, and charter coverage.
Keywords: AGI evaluation, structural inevitability, Charter Protocol, Kolmogorov complexity, posterior concentration, invariant-constrained completion, specification completeness
Introduction
Most AGI definitions begin with the system: its behavior across tasks, environments, benchmarks, or a universal reward distribution (Turing, 1950; Goertzel and Pennachin, 2007; Legg and Hutter, 2007; Hutter, 2005; Chollet, 2019; Morris et al., 2024). The present paper begins with a smaller unit: a problem and the response returned to it. The proposed criterion is satisfied when the completed problem specification separates one admissible semantic class from all rivals by a conditional description-length margin that remains visible under approximation.
In practice, the question is whether the problem has been specified enough to favor the correct semantic class. A chartered mathematics problem names the objects, transformations, and verification criteria. A chartered policy task names authority, temporal scope, invariants, exceptions, and escalation rules. In each case, the achievement is recovery of the determined semantic class under stated conditions.
The theory has three layers:
- Ideal target: exact structural inevitability over semantic response classes.
- Computable approximation: Charter Protocol as specification compilation, routing, verification, and escalation.
- Empirical proxy: benchmark regimes where correctness, invariant satisfaction, and K-gap proxies can be tested.
All major claims in the paper are indexed to those layers. Exact Kolmogorov complexity supplies the limiting object; charters supply a computable approximation path; experiments test whether the approximation behaves as the theory predicts.
What this paper does not claim. Structural inevitability is not a full system-level account of artificial general intelligence. It does not supply autonomy, continual learning, open-ended exploration, transfer, value elicitation, governance, public legitimacy, or deployment safety by itself. The invariant result presupposes feasible grounded values inside the admissible set; the observer-credence result is a posterior bound for a declared observer score; the coverage result is finite and limited to reachable charter-realizable families.
Definitions
The definitions begin at the response level. The surface string is only a representative. Two answers may differ in wording, order, or implementation detail while carrying the same task-relevant solution. The object ranked by
Definition 2.1 (Semantic response class)
For a problem specification
This is the same quotienting move used in the companion verification paper (Broderick, 2026): the object of correctness is the semantic response class. Literal strings are surface representatives of that class.
Definition 2.2 (Structurally inevitable completion)
Let
relative to a fixed reference language or representation
Definition 2.3 (Per-instance response-level SI criterion)
A system satisfies the Problem-Solution Isomorphism / Structural Inevitability (PSI/SI) response criterion on a well-formed problem
Definition 2.4 (Per-family SI competence)
Over a domain family
Remark 2.5 (Idealization)
Exact
Remark 2.6 (Operational margin)
Let
Observer-Relative Posterior Concentration
The posterior-concentration result is deliberately narrow. Fix an observer, the evidence available to that observer, a scoring rule over a finite candidate set, and a Gibbs-form credence model. If the correct class is visibly separated from its rivals under that score, the posterior mass assigned to it increases with the visible gap. This is an epistemic calculation inside the model; psychological reliance, social warrant, and institutional legitimacy remain outside its scope.
Definition 3.1 (Observer posterior credence)
Let
This is the observer's posterior credence that the proposed class
Definition 3.2 (Observer-visible structural gap)
Let
Theorem 3.3 (Posterior concentration under visible structural gap)
Assume the observer posterior in Definition 3.1 is modeled by the Gibbs distribution
Let
For fixed
Proof.
Since
Dividing numerator and denominator by
Corollary 3.4 (Ideal-to-observer gap transfer)
If
Remark 3.5 (Scope of posterior-concentration theorem)
The theorem is about concentration inside a declared observer model. The observer does not need access to exact Kolmogorov complexity; it needs a scoring rule that exposes a gap among the candidate classes. In practice, the retained gap depends on the observer's evidence, candidate set, score, and error profile.
Invariant-Constrained Admissible Completion
The invariant-constrained result is a statement about problem identity. In a value-laden problem, grounded value conditions define the admissible classes rather than arriving as an after-the-fact preference overlay. If those conditions are active in
Definition 4.1 (Value invariants)
Let
where
Lemma 4.2 (Invariant-constrained admissible completion)
Suppose the value invariants in
Then any PSI/SI-optimal completion of
Proof.
By definition, PSI/SI optimization occurs over the admissible set
Remark 4.3 (Scope of invariant-constrained completion)
At the ideal level, invariant satisfaction is absorbed into problem completion when value invariants are feasible, grounded, and active inside the specification. The remaining work is specification work: eliciting values, resolving stakeholder conflicts, maintaining invariants under deployment shift, auditing approximations, interpreting failures, and designing governance around incomplete or unstable specifications.
Failure Modes
The result loses force when the value-laden problem fails to determine a stable admissible set. Typical causes include underspecified values, unresolved stakeholder conflict, hidden externalities, distribution shift, approximate optimization, multiple optima, and representation drift. In those cases the repair question is concrete: which constraint failed to enter, failed to translate, failed to bind, or ceased to apply?
Charter Protocol as Computable Approximation
Charter Protocol is the operational layer that makes well-formedness executable before a model answers. A charter records the domain boundary, normalizer, hard constraints, invariants, permitted and prohibited methods, success criteria, verifiers, routing rules, and escalation behavior. Compilation turns a raw problem into a charter-enriched specification with a candidate space, a computable structural surrogate, a verifier, and an inadequacy path.
The role of the protocol is to make the SI condition operational enough to guide execution. When the charter is adequate and the surrogate is order-consistent with the ideal ranking on valid semantic solutions, the executor can recover the PSI/SI minimizer for the covered family.
Definition 5.1 (Domain charter)
A domain charter is an executable tuple
where
Definition 5.2 (Meta-charter)
A meta-charter is a partial routing map
where
Definition 5.3 (Charter compilation)
For a charter
where
Definition 5.4 (Order-consistency)
For valid semantic solutions in
Operationally, a charter may claim margin-limited order-consistency only above a declared proxy-error threshold. Below that threshold, it must return unresolved adequacy rather than assert structural inevitability.
Definition 5.5 (Charter adequacy)
A charter
finds a verifier-passing candidate if one exists in the compiled candidate space or returns an explicit inadequacy witness. Silent answers are disallowed when coverage, verifier adequacy, or proxy margin is unresolved. When
Worked Charter Examples
Chartered sorting function. A small code charter can make the response-level criterion concrete. The domain predicate accepts tasks asking for a Python function over a finite list of integers. The normalizer fixes the signature, input range, and output contract. The invariants require that the output be nondecreasing, preserve the input multiset, terminate within a declared bound, and avoid external service calls. Permitted methods include direct comparison and local data structures; prohibited methods include network calls, hidden global state, and changing the input type. The verifier runs property tests for ordering and multiset preservation plus edge cases for empty lists, duplicates, negative integers, and already-sorted inputs. The surrogate cost can combine AST size, proof/certificate length, and invariant violations. If the prompt asks for behavior outside the charter, such as sorting arbitrary objects by an unspecified comparator, the correct response is an inadequacy signal rather than an unverified answer.
Chartered reimbursement policy. A rule-governed policy charter has the same structure. The domain predicate accepts reimbursement questions under a named policy version and date. The normalizer extracts claimant role, purchase category, amount, approval status, exception basis, and time window. Invariants encode authority order: statute or contract overrides internal policy; explicit approval overrides default denial only when the policy permits delegation; missing receipts trigger escalation instead of silent approval. Prohibited methods include inventing missing approvals or using a later policy version. The verifier checks each decision against the normalized fields and returns pass, reject, or inadequacy. The structural surrogate prefers the shortest decision path that satisfies the authority order and all invariants. A response that approves a request by ignoring the receipt rule is not a rival completion of the same chartered problem; it is a completion of a different effective problem with a weakened invariant.
The next lemma is the local bridge from ideal SI to execution. Adequacy preserves the intended solution set; verifier soundness prevents invalid candidates from passing; order-consistency makes the computable surrogate choose the same semantic class as the ideal ranking.
Lemma 5.6 (Local proxy-to-ideal bridge)
Fix a certifiably well-formed problem
Proof.
Adequacy preserves the correct solution set. By adequacy, the SI class and every relevant valid rival have verifier-passable representatives in the compiled candidate space, while soundness prevents invalid candidates from passing. Order-consistency makes minimization of the computable surrogate over verifier-passing valid representatives select the same semantic class as minimization of the ideal description length. If the ideal minimizer is separated by margin
The coverage result is a finite coverage statement. The reachable regime is already partitioned into finitely many charter-realizable families. Covered families have verified-adequate charters. Uncovered families produce witnesses, and the authoring operator converts those witnesses into verified charters. Conservative routing prevents silent answers outside the verified library during this process.
Theorem 5.7 (Finite coverage under charter authoring)
Let a reachable regime
Proof sketch.
Because
Remark 5.8 (Boundary of finite coverage)
The theorem is intentionally bounded. It covers reachable, charter-realizable families and excludes open-ended self-referential task families, unknown empirical phenomena, uncheckable domains, and tasks outside the available tools and representations.
Experimental Program
The empirical question is whether charter-conditioned inference behaves like the computable approximation predicts. A successful charter should improve correctness and invariant satisfaction by favoring the intended semantic class before generation. The tests need external correctness checks, explicit invariants, measurable proxy gaps, and comparison against minimal prompting and ordinary context augmentation. Synthetic rule worlds, theorem proving, code generation with hidden tests, and rule-based policy reasoning are appropriate first tests; HorizonMath is a later-stage stress test because it targets predominantly unsolved mathematical discovery with automated verification (Wang et al., 2026). The hypotheses are:
- charter conditioning improves correctness and invariant satisfaction;
- estimated structural gap correlates with correctness;
- charter advantage increases with model capability when the charter is adequate and creates a positive structural margin;
- SI verification failures diagnose missing invariants or wrong routing.
These tests evaluate whether the computable approximation behaves as the theory predicts; exact Kolmogorov-minimality remains an ideal target.
Relation to Existing AGI and Alignment Work
Existing AGI definitions primarily evaluate systems across tasks, environments, or behavioral benchmarks. AIXI and universal intelligence define system-level optimality across environments (Hutter, 2005; Legg and Hutter, 2007). ARC-style work emphasizes abstraction and skill acquisition (Chollet, 2019). PSI/SI evaluates a different object: whether a response is the structurally inevitable completion of a specified problem. This response-level criterion can coexist with system-level measures, while giving a formal account of local problem completion.
Panigrahy and Sharan prove an incompatibility result under strict definitions of safety, trust, and AGI: safety means never making false claims, trust means assuming safety, and AGI means matching or exceeding human capability (Panigrahy and Sharan, 2025). PSI/SI does not refute that result. It restricts guarantees to charter-bounded domains with explicit admissible sets, verifiers, and inadequacy behavior.
The computable layer is also adjacent to scalable oversight and preference-shaping methods such as RLHF, Constitutional AI, amplification, and debate (Christiano et al., 2017; Ouyang et al., 2022; Bai et al., 2022; Christiano et al., 2018; Irving et al., 2018). Those methods shape training signals, critiques, and supervision. Charter Protocol instead asks whether the problem has been compiled into a form where admissibility, invariants, verifiers, and escalation are explicit before the response is accepted.
Limitations and Non-Claims
- Exact Kolmogorov complexity is uncomputable; all implementation claims require proxies.
- Small margins are representation-sensitive.
- Observer-relative trust claims reduce here to posterior concentration under a declared observer model.
- Invariant-constrained completion assumes fully grounded, feasible, positive-margin value-laden specifications.
- Charter coverage is relative to reachable charter-realizable families.
- Safety engineering remains necessary in incomplete-SI and approximate-SI regimes.
- Empirical validation is still required; HorizonMath and related benchmarks are proxies for response-level behavior.
Conclusion
PSI/SI supplies a response-level criterion for AGI evaluation: a system succeeds on a well-formed problem when it returns the structurally inevitable semantic completion. Charter Protocol makes the criterion operational through domain boundaries, invariants, output contracts, compiled verifiers, routing, and escalation.
The theorem cluster identifies the conditions under which the surrounding claims become formal. Visible structural gap supports posterior concentration under an observer model. Grounded value invariants become part of admissible problem completion. Reachable charter-realizable families admit finite coverage under charter-authoring assumptions.
The practical program is to build charters that create measurable margins, test those margins with external correctness checks, and record inadequacy when well-formedness, verifier adequacy, or coverage fails.
References
1. Yuntao Bai et al. Constitutional AI: Harmlessness from AI feedback. arXiv:2212.08073, 2022. doi:10.48550/arXiv.2212.08073.
2. Ian Broderick. Structural Inevitability: Specification-Side Source Disambiguation for Oracle-Limited Verification. Citium Verification Preprint, 2026.
3. Francois Chollet. On the measure of intelligence. arXiv:1911.01547, 2019. doi:10.48550/arXiv.1911.01547.
4. Paul F. Christiano et al. Deep reinforcement learning from human preferences. NeurIPS 30, 2017. arXiv:1706.03741. doi:10.48550/arXiv.1706.03741.
5. Paul F. Christiano, Buck Shlegeris, and Dario Amodei. Supervising strong learners by amplifying weak experts. arXiv:1810.08575, 2018. doi:10.48550/arXiv.1810.08575.
6. Ben Goertzel and Cassio Pennachin, editors. Artificial General Intelligence. Cognitive Technologies, Springer, 2007. doi:10.1007/978-3-540-68677-4.
7. Peter D. Grunwald. The Minimum Description Length Principle. MIT Press, 2007.
8. Erik Y. Wang, Sumeet Motwani, James V. Roggeveen, Eliot Hodges, Dulhan Jayalath, Charles London, Kalyan Ramakrishnan, Flaviu Cipcigan, Philip Torr, and Alessandro Abate. HorizonMath: Measuring AI progress toward mathematical discovery with automatic verification. arXiv:2603.15617, 2026. doi:10.48550/arXiv.2603.15617.
9. Marcus Hutter. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer, 2005.
10. Geoffrey Irving, Paul Christiano, and Dario Amodei. AI safety via debate. arXiv:1805.00899, 2018. doi:10.48550/arXiv.1805.00899.
11. Shane Legg and Marcus Hutter. Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4):391--444, 2007. doi:10.1007/s11023-007-9079-x.
12. Ming Li and Paul Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications. Third edition, Springer, 2008.
13. Meredith Ringel Morris et al. Position: Levels of AGI for operationalizing progress on the path to AGI. ICML, Proceedings of Machine Learning Research 235:36308--36321, 2024. arXiv:2311.02462. doi:10.48550/arXiv.2311.02462.
14. Long Ouyang et al. Training language models to follow instructions with human feedback. NeurIPS 35, 2022. arXiv:2203.02155. doi:10.48550/arXiv.2203.02155.
15. Rina Panigrahy and Vatsal Sharan. Limitations on safe, trusted, artificial general intelligence. arXiv:2509.21654, 2025. doi:10.48550/arXiv.2509.21654.
16. Jorma Rissanen. Modeling by shortest data description. Automatica, 14(5):465--471, 1978. doi:10.1016/0005-1098(78)90005-5.
17. Ray J. Solomonoff. A formal theory of inductive inference. Parts I and II. Information and Control, 7(1):1--22 and 7(2):224--254, 1964. doi:10.1016/S0019-9958(64)90223-2 and doi:10.1016/S0019-9958(64)90131-7.
18. Alan M. Turing. Computing machinery and intelligence. Mind, 59(236):433--460, 1950.
BibTeX
@article{broderick2026agi,
title = {Structural Inevitability as a Response-Level Criterion for AGI Evaluation},
author = {Broderick, I.},
affiliation = {Citium},
year = {2026}
}