II. Formal Properties
Verification Status
A machine-checked account of Clawq’s proven properties.
Last verification refresh: 2026-03-17 · Coq 8.19 · 380 verified theorem/lemma statements across 21 phases
Status note. All 22 verification phases (F0–F22) are verified, with 374 theorem/lemma statements across 27 Coq files and only 1 Admitted proof remaining. Recent additions cover tool invocation safety (F13), config merge (F14), MCP framing (F15), sandbox policy (F17), pair coding (F18), model format parsing (F19), task tree status machine (F20), cron scheduling (F21), and Discord gateway protocol (F22). See Roadmap below.
Coq-Verified Core
Verified Statements
Modules Extracted
Lines of Coq
Repository Theorem/Lemma Statements
Phase Status
Verification Ledger
ConfigProofs.vPathSafety.vAuditChain.vRateLimiter.vConfig.v + ConfigProofs.vQuoteParsing.v + ShellSafety.vSecretStore.vChannelAuth.vAuditRetention.vAgentLoop.vSessionIsolation.vLandlockPolicy.vModule Breakdown
F1: Configuration Validation
Source: coq/theories/Clawq/ConfigProofs.v
For all p, valid_port p = true implies 1 <= p <= 65535.
Proof: ConfigProofs.v
validate_config default_config = true. The built-in defaults always pass validation.
Proof: ConfigProofs.v
Provider weights sum to a consistent total; no weight is negative.
Proof: ConfigProofs.v
For all t, valid_temperature t = true implies 0.0 <= t <= 2.0.
Proof: ConfigProofs.v
The default security configuration enables sandboxing, rate limiting, and audit logging.
Proof: ConfigProofs.v
F2: Directory Traversal Immunity
Source: coq/theories/Clawq/PathSafety.v — Extracted to OCaml and integrated into tools_builtin.ml
Path normalization eliminates all ’..’ segments. The output of normalize never contains a dotdot component.
Proof: PathSafety.v
Normalizing an already-normalized path yields the same path: normalize (normalize p) = normalize p.
Proof: PathSafety.v
A path whose segments are all safe is itself safe.
Proof: PathSafety.v
Any path that passes the safety check is a prefix of the allowed workspace root. No file access escapes the sandbox.
Proof: PathSafety.v
F3: HMAC Audit Chain Integrity
Source: coq/theories/Clawq/AuditChain.v
Appending a correctly-linked entry to a valid chain preserves chain validity.
Proof: AuditChain.v
A chain produced by build_chain always verifies. Construction implies validity.
Proof: AuditChain.v
Any suffix of a valid chain is itself valid. Truncation preserves integrity.
Proof: AuditChain.v
The last signature of a concatenated chain equals the last signature of the appended segment.
Proof: AuditChain.v
F4: Token Bucket Rate Limiter
Source: coq/theories/Clawq/RateLimiter.v
After refill, tokens never exceed the bucket capacity. The bucket cannot overflow.
Proof: RateLimiter.v
Refill is monotonically non-decreasing: tokens after refill >= tokens before refill.
Proof: RateLimiter.v
Properties P1–P6: consume decrements by exactly the requested amount when sufficient tokens exist; returns the bucket unchanged when insufficient; the allowed/denied decision is consistent with the token count.
Proof: RateLimiter.v
F5: Extended Configuration Validation
Source: coq/theories/Clawq/Config.v + ConfigProofs.v
Port validation is both sound and complete with respect to the 1–65535 range.
Proof: ConfigProofs.v
Temperature validation enforces the 0.0–2.0 range with correct boundary behavior.
Proof: ConfigProofs.v
Full configuration validation (P8–P15) checks all fields: port, temperature, max_tokens, history depth, provider weights, security flags, and rate limits.
Proof: ConfigProofs.v
F6: Shell Safety
Source: coq/theories/Clawq/QuoteParsing.v + ShellSafety.v — Extracted to OCaml
split_command_words produces a list of tokens with no embedded shell metacharacters when the input passes is_shell_safe.
Proof: QuoteParsing.v
The character blacklist (;&|${">"}{"<"}!) is complete for the threat model: no injection is possible via unlisted characters in a POSIX shell without expansion.
Proof: ShellSafety.v
is_allowed cmd allowlist = true implies is_allowed cmd (allowlist ++ extra) = true. Extending the allowlist never revokes existing permissions.
Proof: ShellSafety.v
F7: Encryption Correctness
Source: coq/theories/Clawq/SecretStore.v — Spec-only (crypto axiomatized)
Plaintext values (no $ENC: prefix) pass through decrypt unchanged.
Proof: SecretStore.v
resolve_secret handles all three cases: encrypted ($ENC:), environment variable ($VAR), and plaintext — with correct behavior for missing master key and decrypt failure.
Proof: SecretStore.v
Already-encrypted values ($ENC: prefix) are never re-encrypted. Environment variable references ($VAR) are preserved unchanged.
Proof: SecretStore.v
F8: Channel Authentication
Source: coq/theories/Clawq/ChannelAuth.v — Extracted to OCaml
Allowlist membership is bidirectional: is_allowed id list = true iff id is in the list or the list is the wildcard [”*”].
Proof: ChannelAuth.v
The replay prevention window enforces a 300-second absolute skew bound. Requests outside this window are rejected.
Proof: ChannelAuth.v
Telegram pairing is active strictly before expiry and inactive at or after expiry. The biconditional fully characterizes the validity window.
Proof: ChannelAuth.v
F9: Audit Retention Safety
Source: coq/theories/Clawq/AuditRetention.v
Combined count-and-age purge preserves chain validity when anchored with the correct inherited prev_sig from the dropped prefix.
Proof: AuditRetention.v
Count-based retention produces a suffix of the original chain. The result length is bounded by the retention count.
Proof: AuditRetention.v
F10: Agent Loop Termination
Source: coq/theories/Clawq/AgentLoop.v
The agent turn loop terminates within max_tool_iterations steps. Fuel decreases monotonically.
Proof: AgentLoop.v
Trimming an already-trimmed history is a no-op: trim(trim(h)) = trim(h).
Proof: AgentLoop.v
After trim_history, the history length is bounded by effective_max_messages.
Proof: AgentLoop.v
The runtime-aligned sanitizer removes replay-unsafe tool/result pairs, so every retained tool result and assistant tool call still has a matching counterpart in the source history.
Proof: AgentLoop.v
Compaction split repair guarantees the kept suffix does not begin with an orphan tool result after boundary adjustment.
Proof: AgentLoop.v
After split repair, sanitization, and summary insertion, the compacted history remains replay-safe for OpenAI/Codex tool-call replay.
Proof: AgentLoop.v
F11: Session Isolation
Source: coq/theories/Clawq/SessionIsolation.v
Storing a message to session A does not affect session B’s history (for A != B).
Proof: SessionIsolation.v
Creating or retrieving session A leaves all other sessions unchanged.
Proof: SessionIsolation.v
Resetting a session twice is equivalent to resetting it once.
Proof: SessionIsolation.v
F12: Landlock Sandbox Policy
Source: coq/theories/Clawq/LandlockPolicy.v — all bitwise arithmetic lemmas are fully proved
Adding a rule preserves the least-privilege property: no path class receives more access than its minimal permission.
Proof: LandlockPolicy.v
Sandbox initialization always returns a result, whether Landlock is available or not. Graceful degradation is guaranteed.
Proof: LandlockPolicy.v
Trust Boundary
Verification Roadmap
All 12 verification phases (F1–F12) are complete, with 380 theorem/lemma statements across 13 Coq files. Both P5 (Formal Verification Expansion) and P6 (Gap Closure) are complete. The next wave targets new verification domains: tool safety, config merge, MCP framing, and SecretStore axiom reduction. See the Formal Verification Roadmap for proof-expansion history and the Formal Verification Audit for the gap analysis.
Gap Closure (P6) — Complete
P6 closed the distance between proved specifications and the shipped runtime across three milestones (14 tasks):
- Extraction Wiring: Connected extracted Coq functions (
validate_config_full,ChannelAuth.is_allowed, path/shell safety) into runtime paths with conformance tests. - Audit Chain Extraction: Instantiated the abstract
AuditChainfunctor with concrete SHA256/HMAC-SHA256, extracted and wiredverify_chain/make_entryintosrc/audit.ml. - Runtime Invariant Enforcement: Added runtime assertions for
RateLimiterbounded-refill,AgentLoophistory-length bounds, and conformance test suites exercising all spec-only modules. Discharged all admitted bitwise lemmas inLandlockPolicy.v.
Next Wave: Domain Expansion
Phases F13–F22 are now complete, covering tool invocation safety, config merge semantics, MCP protocol framing, sandbox policy, pair coding protocol, model format parsing, task tree status machine, cron scheduling, and Discord gateway protocol. The next targets are closing the spec-implementation gap: wiring extracted functions into runtime, adding conformance tests, and proving the remaining McpFraming Admitted theorem.
Extraction remains selective and deferred unless a pure kernel is stable and demonstrably worth replacing in shipped OCaml.
FAQ: Should Every Formalization Be Extracted?
No. We aim to extract the right pure kernels, not every formalization by default.
Extract when the Coq artifact is a small pure computation, matches shipped runtime behavior closely, and is actually worth making the runtime’s single source of truth.
Keep a theory spec-only when it mostly models trusted boundaries or effects — like crypto, databases, locking, OS policy, or higher-level control flow — or when the proof model still diverges from the OCaml implementation.
Today the best extraction candidates are things like Cli, Config, PathSafety, QuoteParsing, selective ShellSafety, and eventually some AuditChain or RateLimiter helpers. Modules like SecretStore, SessionIsolation, AgentLoop, and most of AuditRetention are higher-value as proofs first, extraction later.
Priority Matrix
| Phase | Module | Extract? | Security ROI | Difficulty | Order |
|---|---|---|---|---|---|
| F13 | ToolSafety.v | 24 theorems | Very High | Medium | Done |
| F14 | ConfigMerge.v | 12 theorems | High | Medium | Done |
| F15 | McpFraming.v | 30 theorems | Medium-High | Medium | Done |
| F16 | MemoryIsolation.v | Not started | Medium-High | Medium | Planned |
| F17 | SandboxPolicy.v | 16 theorems | Medium | Low | Done |
| F18 | PairCoding.v | 34 theorems | Low | Medium | Done |
| F19 | PmodelParsing.v | 13 theorems | Low | Low | Done |
| F20 | TaskTree.v | 20 theorems | Low | Low | Done |
| F21 | SchedulerCron.v | 20 theorems | Low | Low | Done |
| F22 | DiscordGateway.v | 20 theorems | Medium | Medium | Done |
Completed Phases (F13–F22)
F13: Tool Invocation Safety — 24 theorems proving authorization requirements by risk level, allowlist/authorization monotonicity, risk ordering (reflexive, transitive, antisymmetric, total), chain safety, and invocation safety composition.
F14: Config Merge Semantics — 12 theorems proving default preservation for missing JSON fields, security field floor enforcement, and single-application of resolve_secret per API key.
F15: MCP Protocol Framing — 30 theorems proving JSON-RPC request/response ID pairing, method dispatch correctness, Content-Length framing length, and response construction. 1 Admitted (parse round-trip needs concrete string_sub axioms).
F17: Sandbox Policy — 16 theorems proving backend detection fallback ordering (Firejail > Bubblewrap > NoSandbox), command wrapping requires both isolation mode and real backend, workspace exclusion from extra paths, and filtered path bounds.
F18: Pair Coding State Machine — 34 theorems proving phase level monotonicity, review round monotonicity, completion requires approval or max rounds, Done is terminal, abort/timeout always succeed, review entry clears approvals, and per-phase transition classification.
F19: Model Format Parsing — 13 theorems proving canonical format always converts, bare format requires default provider, deprecation warning iff non-canonical, and conversion preserves model field.
F20: Task Tree Status Machine — 20 theorems proving terminal states have no outgoing transitions, actionable states can be cancelled, Task_error is stuck, transitions advance status order, no self-transitions, and pending reaches terminal in at most 2 steps.
F21: Cron Field Matching — 20 theorems proving wildcard matches all values, non-wildcard field_matches iff membership, valid field values within range, all-wildcard cron matches any time, and field matching monotonicity.
F22: Discord Gateway Protocol — 20 theorems proving Hello starts heartbeat, READY establishes session, reconnect preserves session for resumption, invalid session (fresh) clears state, heartbeat timeout disconnects, and full connect/reconnect lifecycle correctness.
Remaining Planned
F16: Memory Store Isolation — Spec-only proof of cross-session non-interference for memory store/recall/forget operations. Prove key-scoped search returns only results from the querying session’s namespace.
Data sourced from docs/src/data/formal_verification.yml and generated into docs/src/data/fv-stats.json by make fv-all