II. Formal Properties

Formal Verification Audit - 2026-03-07

A targeted review of the current Coq theories, the properties they actually cover, and the highest-value proof work still open.

Audit date: 2026-03-07 · Status: exploratory / unlinked page · Tags: formal-verification, coq, audit, proof-planning, security

Related pages. See the execution plan in Formal Verification Roadmap - 2026-03-07.

Executive View

The strongest current proofs are the pure, extracted safety kernels: PathSafety, QuoteParsing, and ShellSafety.
Several higher-level security theories exist, but many of the most important statements are still admitted, axiom-backed, or only loosely aligned with the OCaml implementation.
The biggest immediate verification opportunity is audit-log retention, because the current theory already exposes a false naive theorem and the runtime appears to share the same structural issue.
The second major opportunity is session isolation under real concurrency, where the current spec is too weak to certify the implementation.

What Looks Strong Today

Extracted, implementation-adjacent proofs

The most valuable current verification work is in the modules that are both:

pure enough to prove cleanly, and
actually wired into runtime checks.

That is currently true for:

coq/theories/Clawq/PathSafety.v
coq/theories/Clawq/QuoteParsing.v
coq/theories/Clawq/ShellSafety.v
extraction wiring in coq/theories/Clawq/Extract.v
runtime use in src/tools_builtin.ml

These theories give the repo a real machine-checked core for path normalization, shell tokenization safety, and basic allowlist monotonicity.

Useful but lower-value proof areas

coq/theories/Clawq/ConfigProofs.v is tidy and complete for the small model in coq/theories/Clawq/Config.v.
coq/theories/Clawq/CliProofs.v is mostly complete for the fallback CLI model in coq/theories/Clawq/Cli.v.

These are fine as local consistency proofs, but they currently certify much less than the real runtime actually does.

Where Current Theorems Skip Important Bits

1. Audit retention is the highest-value unresolved correctness problem

coq/theories/Clawq/AuditRetention.v already discovers that the naive theorem

verify_chain key prev_sig entries = true
-> verify_chain key prev_sig (purged entries) = true

is false for suffix retention. That is exactly the right thing to notice.

The runtime in src/audit.ml originally had the same shape:

verification starts from genesis in verify_chain
retention deletes old rows by age and count in purge_old

That theory gap pointed directly at a real runtime design constraint. The runtime now preserves a retained-chain anchor across purge/export/import, so the remaining work is to keep the public proof story and operator docs aligned with that anchored behavior rather than the false genesis-based one.

2. Session isolation proofs are materially too weak

coq/theories/Clawq/SessionIsolation.v was previously limited by a vacuous disjointness story and an over-abstract atomicity narrative. The current theory is stronger:

states_disjoint is now phrased as key-local non-interference under store_message
same-key properties such as reset_session_idempotent, get_or_create_twice_stable, and reset_then_create_gives_empty_state are modeled explicitly

On the implementation side, src/session.ml still deserves careful concurrency review, but the proof/runtime story now matches the shipped same-key reset/create behavior much more closely.

3. Secret-store proofs do not match the shipped API closely enough

coq/theories/Clawq/SecretStore.v is now materially closer to src/secret_store.ml, but it remains intentionally spec-oriented around trusted crypto boundaries:

plaintext passthrough is modeled explicitly
missing-master-key and decrypt-failure branches are represented
config rewrite preservation and no-double-encrypt properties are covered

This is now useful implementation-aligned verification, even though crypto correctness itself remains axiom-backed.

4. Channel-auth proofs are not yet buying much runtime assurance

coq/theories/Clawq/ChannelAuth.v is now more valuable than the earlier audit state: the core allowlist, freshness, and Telegram pairing properties are proved, with no remaining Admitted statements in the file. The main remaining limitation is that the theory is still a shared model for several channel runtimes rather than a proof over each integration’s full implementation.

The real authorization checks live separately in:

src/slack.ml
src/discord.ml
src/telegram.ml

So this proof area is promising, but not yet carrying its weight.

5. CLI and config proofs certify only the smaller fallback models

coq/theories/Clawq/Cli.v models the mini fallback dispatcher, not the real command bridge.
coq/theories/Clawq/Config.v models a compact config record, while the runtime config surface is much wider.

Those proofs are still correct, but they are not yet targeted at the most valuable behavior.

Important Properties Missing Today

Audit and integrity

retained signed audit suffixes remain verifiable with the correct anchor
purge/export/restore does not silently destroy chain verifiability
mixed signed and unsigned histories behave predictably under verification

Sessions and concurrency

same-session operations are serialized or otherwise linearizable
distinct sessions cannot mutate each other’s persisted history
session restoration plus cleanup preserves message order and boundedness

Secret handling

encrypted values round-trip through the real API shape
plaintext passthrough behavior matches implementation
config secret rewriting preserves non-secret fields exactly
environment-variable indirection cannot be mistaken for encrypted data

Channel security

allowlist correctness for the actual per-channel rules
replay-window correctness for real timestamp checks
TOTP pairing soundness and expiry for Telegram session pairing
combined authorization property: unauthorized sender cannot trigger a turn

Agent loop

tool iteration cap is enforced by the real loop
history trimming and compaction preserve ordering invariants
tool call / tool result structure stays well-formed across iterations
interrupts cannot leave history in a malformed intermediate state

Tool perimeter

Coq and OCaml path checks agree on their shared pure domain
extra allowed roots extend access monotonically without escaping the intended boundary
shell tokenizer equivalence between Coq and OCaml helper implementations

Prioritized Proof Roadmap

Priority 1: Audit retention

Why first:

it is security-relevant
it affects data retention correctness
the theory already exposes a false formulation, so there is a clear next step

Best next theorems:

suffix validity under the correct inherited prev_sig
retained-chain validity with an explicit exported anchor
purge policy preserves verifiability modulo re-anchoring

Priority 2: Session isolation and concurrency

Why second:

this is core runtime behavior
the current theory is weaker than the implementation needs
any race here contaminates multi-user safety claims

Best next theorems:

same-key turn serialization
other-key non-interference for state and persistence
get-or-create uniqueness / no duplicate live session state for one key

Priority 3: Secret-store alignment

Why third:

it is security-sensitive
the proof model currently diverges from reality
closing the model gap unlocks much more meaningful theorems later

Best next steps:

first revise the Coq model to match src/secret_store.ml
then prove round-trip, passthrough, env-var, and config-rewrite properties

Priority 4: Channel-auth alignment

Best next theorems:

shared wildcard/member allowlist correctness for the actual channel code shape
freshness/replay properties for timestamp checks
Telegram TOTP session validity and expiry properties

Priority 5: Agent-loop invariants

Best next theorems:

bounded iteration on the real control flow shape
history structure invariants across tool-call loops
trim/compact idempotence and ordering preservation for the real algorithm

Module-by-Module Assessment

Module	Assessment	Notes
`PathSafety.v`	Strong	Pure, useful, extracted, close to runtime use
`QuoteParsing.v`	Strong	Best current end-to-end proof story for shell token safety
`ShellSafety.v`	Good but narrow	Useful monotonicity, but broader tool perimeter still unproved
`ConfigProofs.v`	Locally solid	Small model, limited runtime coverage
`CliProofs.v`	Locally solid	Mostly about fallback behavior
`AuditChain.v`	Good structural base	Valuable, but retention story is where the real work starts
`RateLimiter.v`	Good spec kernel	Could use stronger temporal properties
`SecretStore.v`	Important but misaligned	Needs model cleanup before it becomes high-trust
`ChannelAuth.v`	Important but unfinished	Multiple admitted proofs, not well wired to runtime
`AuditRetention.v`	Highest leverage	Contains the most actionable open problem
`SessionIsolation.v`	Useful start, too weak	Atomicity axiom + vacuous disjointness limit confidence
`AgentLoop.v`	Too abstract today	Needs to move closer to `src/agent.ml`
`LandlockPolicy.v`	Promising spec	Core lemmas still admitted

Recommended Near-Term Plan

Finish AuditRetention.v around the correct anchored-suffix theorem, then check whether src/audit.ml needs an anchor-aware verification/export design.
Strengthen SessionIsolation.v so it states non-vacuous isolation properties and models same-key behavior more honestly.
Refactor SecretStore.v to match the real OCaml API, then prove the basic invariants again against the aligned model.
Either wire ChannelAuth.v into real channel checks or replace it with proofs tailored to the actual per-channel authorization logic.
Revisit the public verification status page after those changes; several currently listed phases are better described as planned or partial rather than fully verified.

Bottom Line

The codebase already has a real verified core, especially around path and shell safety, but the highest-value next work is no longer about adding more small local lemmas. The biggest gains now come from tightening the connection between proof models and the runtime’s real security boundaries: audit retention, session concurrency, secret handling, and channel authentication.

Formal Verification Audit - 2026-03-07

Executive View

What Looks Strong Today

Extracted, implementation-adjacent proofs

Useful but lower-value proof areas

Where Current Theorems Skip Important Bits

1. Audit retention is the highest-value unresolved correctness problem

2. Session isolation proofs are materially too weak

3. Secret-store proofs do not match the shipped API closely enough

4. Channel-auth proofs are not yet buying much runtime assurance

5. CLI and config proofs certify only the smaller fallback models

Important Properties Missing Today

Audit and integrity

Sessions and concurrency

Secret handling

Channel security

Agent loop

Tool perimeter

Prioritized Proof Roadmap

Priority 1: Audit retention

Priority 2: Session isolation and concurrency

Priority 3: Secret-store alignment

Priority 4: Channel-auth alignment

Priority 5: Agent-loop invariants

Module-by-Module Assessment

Recommended Near-Term Plan

Bottom Line

Related Pages