II. Formal Properties
Formal Verification Audit - 2026-03-07
A targeted review of the current Coq theories, the properties they actually cover, and the highest-value proof work still open.
Audit date: 2026-03-07 · Status: exploratory / unlinked page · Tags: formal-verification, coq, audit, proof-planning, security
Related pages. See the execution plan in Formal Verification Roadmap - 2026-03-07.
Executive View
- The strongest current proofs are the pure, extracted safety kernels:
PathSafety,QuoteParsing, andShellSafety. - Several higher-level security theories exist, but many of the most important statements are still admitted, axiom-backed, or only loosely aligned with the OCaml implementation.
- The biggest immediate verification opportunity is audit-log retention, because the current theory already exposes a false naive theorem and the runtime appears to share the same structural issue.
- The second major opportunity is session isolation under real concurrency, where the current spec is too weak to certify the implementation.
What Looks Strong Today
Extracted, implementation-adjacent proofs
The most valuable current verification work is in the modules that are both:
- pure enough to prove cleanly, and
- actually wired into runtime checks.
That is currently true for:
coq/theories/Clawq/PathSafety.vcoq/theories/Clawq/QuoteParsing.vcoq/theories/Clawq/ShellSafety.v- extraction wiring in
coq/theories/Clawq/Extract.v - runtime use in
src/tools_builtin.ml
These theories give the repo a real machine-checked core for path normalization, shell tokenization safety, and basic allowlist monotonicity.
Useful but lower-value proof areas
coq/theories/Clawq/ConfigProofs.vis tidy and complete for the small model incoq/theories/Clawq/Config.v.coq/theories/Clawq/CliProofs.vis mostly complete for the fallback CLI model incoq/theories/Clawq/Cli.v.
These are fine as local consistency proofs, but they currently certify much less than the real runtime actually does.
Where Current Theorems Skip Important Bits
1. Audit retention is the highest-value unresolved correctness problem
coq/theories/Clawq/AuditRetention.v already discovers that the naive theorem
verify_chain key prev_sig entries = true
-> verify_chain key prev_sig (purged entries) = true
is false for suffix retention. That is exactly the right thing to notice.
The runtime in src/audit.ml originally had the same shape:
- verification starts from genesis in
verify_chain - retention deletes old rows by age and count in
purge_old
That theory gap pointed directly at a real runtime design constraint. The runtime now preserves a retained-chain anchor across purge/export/import, so the remaining work is to keep the public proof story and operator docs aligned with that anchored behavior rather than the false genesis-based one.
2. Session isolation proofs are materially too weak
coq/theories/Clawq/SessionIsolation.v was previously limited by a vacuous disjointness story and an over-abstract atomicity narrative. The current theory is stronger:
states_disjointis now phrased as key-local non-interference understore_message- same-key properties such as
reset_session_idempotent,get_or_create_twice_stable, andreset_then_create_gives_empty_stateare modeled explicitly
On the implementation side, src/session.ml still deserves careful concurrency review, but the proof/runtime story now matches the shipped same-key reset/create behavior much more closely.
3. Secret-store proofs do not match the shipped API closely enough
coq/theories/Clawq/SecretStore.v is now materially closer to src/secret_store.ml, but it remains intentionally spec-oriented around trusted crypto boundaries:
- plaintext passthrough is modeled explicitly
- missing-master-key and decrypt-failure branches are represented
- config rewrite preservation and no-double-encrypt properties are covered
This is now useful implementation-aligned verification, even though crypto correctness itself remains axiom-backed.
4. Channel-auth proofs are not yet buying much runtime assurance
coq/theories/Clawq/ChannelAuth.v is now more valuable than the earlier audit state: the core allowlist, freshness, and Telegram pairing properties are proved, with no remaining Admitted statements in the file. The main remaining limitation is that the theory is still a shared model for several channel runtimes rather than a proof over each integration’s full implementation.
The real authorization checks live separately in:
src/slack.mlsrc/discord.mlsrc/telegram.ml
So this proof area is promising, but not yet carrying its weight.
5. CLI and config proofs certify only the smaller fallback models
coq/theories/Clawq/Cli.vmodels the mini fallback dispatcher, not the real command bridge.coq/theories/Clawq/Config.vmodels a compact config record, while the runtime config surface is much wider.
Those proofs are still correct, but they are not yet targeted at the most valuable behavior.
Important Properties Missing Today
Audit and integrity
- retained signed audit suffixes remain verifiable with the correct anchor
- purge/export/restore does not silently destroy chain verifiability
- mixed signed and unsigned histories behave predictably under verification
Sessions and concurrency
- same-session operations are serialized or otherwise linearizable
- distinct sessions cannot mutate each other’s persisted history
- session restoration plus cleanup preserves message order and boundedness
Secret handling
- encrypted values round-trip through the real API shape
- plaintext passthrough behavior matches implementation
- config secret rewriting preserves non-secret fields exactly
- environment-variable indirection cannot be mistaken for encrypted data
Channel security
- allowlist correctness for the actual per-channel rules
- replay-window correctness for real timestamp checks
- TOTP pairing soundness and expiry for Telegram session pairing
- combined authorization property: unauthorized sender cannot trigger a turn
Agent loop
- tool iteration cap is enforced by the real loop
- history trimming and compaction preserve ordering invariants
- tool call / tool result structure stays well-formed across iterations
- interrupts cannot leave history in a malformed intermediate state
Tool perimeter
- Coq and OCaml path checks agree on their shared pure domain
- extra allowed roots extend access monotonically without escaping the intended boundary
- shell tokenizer equivalence between Coq and OCaml helper implementations
Prioritized Proof Roadmap
Priority 1: Audit retention
Why first:
- it is security-relevant
- it affects data retention correctness
- the theory already exposes a false formulation, so there is a clear next step
Best next theorems:
- suffix validity under the correct inherited
prev_sig - retained-chain validity with an explicit exported anchor
- purge policy preserves verifiability modulo re-anchoring
Priority 2: Session isolation and concurrency
Why second:
- this is core runtime behavior
- the current theory is weaker than the implementation needs
- any race here contaminates multi-user safety claims
Best next theorems:
- same-key turn serialization
- other-key non-interference for state and persistence
- get-or-create uniqueness / no duplicate live session state for one key
Priority 3: Secret-store alignment
Why third:
- it is security-sensitive
- the proof model currently diverges from reality
- closing the model gap unlocks much more meaningful theorems later
Best next steps:
- first revise the Coq model to match
src/secret_store.ml - then prove round-trip, passthrough, env-var, and config-rewrite properties
Priority 4: Channel-auth alignment
Best next theorems:
- shared wildcard/member allowlist correctness for the actual channel code shape
- freshness/replay properties for timestamp checks
- Telegram TOTP session validity and expiry properties
Priority 5: Agent-loop invariants
Best next theorems:
- bounded iteration on the real control flow shape
- history structure invariants across tool-call loops
- trim/compact idempotence and ordering preservation for the real algorithm
Module-by-Module Assessment
| Module | Assessment | Notes |
|---|---|---|
PathSafety.v | Strong | Pure, useful, extracted, close to runtime use |
QuoteParsing.v | Strong | Best current end-to-end proof story for shell token safety |
ShellSafety.v | Good but narrow | Useful monotonicity, but broader tool perimeter still unproved |
ConfigProofs.v | Locally solid | Small model, limited runtime coverage |
CliProofs.v | Locally solid | Mostly about fallback behavior |
AuditChain.v | Good structural base | Valuable, but retention story is where the real work starts |
RateLimiter.v | Good spec kernel | Could use stronger temporal properties |
SecretStore.v | Important but misaligned | Needs model cleanup before it becomes high-trust |
ChannelAuth.v | Important but unfinished | Multiple admitted proofs, not well wired to runtime |
AuditRetention.v | Highest leverage | Contains the most actionable open problem |
SessionIsolation.v | Useful start, too weak | Atomicity axiom + vacuous disjointness limit confidence |
AgentLoop.v | Too abstract today | Needs to move closer to src/agent.ml |
LandlockPolicy.v | Promising spec | Core lemmas still admitted |
Recommended Near-Term Plan
- Finish
AuditRetention.varound the correct anchored-suffix theorem, then check whethersrc/audit.mlneeds an anchor-aware verification/export design. - Strengthen
SessionIsolation.vso it states non-vacuous isolation properties and models same-key behavior more honestly. - Refactor
SecretStore.vto match the real OCaml API, then prove the basic invariants again against the aligned model. - Either wire
ChannelAuth.vinto real channel checks or replace it with proofs tailored to the actual per-channel authorization logic. - Revisit the public verification status page after those changes; several currently listed phases are better described as planned or partial rather than fully verified.
Bottom Line
The codebase already has a real verified core, especially around path and shell safety, but the highest-value next work is no longer about adding more small local lemmas. The biggest gains now come from tightening the connection between proof models and the runtime’s real security boundaries: audit retention, session concurrency, secret handling, and channel authentication.