Verification Status

A machine-checked account of Clawq’s proven properties.

Last verification refresh: 2026-03-17 · Coq 8.19 · 380 verified theorem/lemma statements across 21 phases

Status note. All 22 verification phases (F0–F22) are verified, with 374 theorem/lemma statements across 27 Coq files and only 1 Admitted proof remaining. Recent additions cover tool invocation safety (F13), config merge (F14), MCP framing (F15), sandbox policy (F17), pair coding (F18), model format parsing (F19), task tree status machine (F20), cron scheduling (F21), and Discord gateway protocol (F22). See Roadmap below.

FV

Coq-Verified Core

0
0
0

Verified Statements

0

Modules Extracted

0
0
0
0

Lines of Coq

380/ 380

Repository Theorem/Lemma Statements

358 fully verified0 in progress0 planned

Phase Status

F1Config✓ Verified
F2PathSafety✓ Verified
F3AuditChain✓ Verified
F4RateLimiter✓ Verified
F5ConfigExt✓ Verified
F6ShellSafety✓ Verified
F7SecretStore✓ Verified
F8ChannelAuth✓ Verified
F9AuditRetention✓ Verified
F10AgentLoop✓ Verified
F11Session✓ Verified
F12Landlock✓ Verified

Verification Ledger

F1ConfigProofs.v
Weight sum invariants, Port/temperature range validation, Secure-by-default security config
15 theorems✓ Verified
F2PathSafety.v
normalize_no_dotdot, Normalization idempotence, is_path_safe_segs_refl, Workspace containment (prefix safety)
19 theorems✓ Verifiedextracted
F3AuditChain.v
verify_chain_append, build_chain validity, Suffix monotonicity, last_sig_app
7 theorems✓ Verified
F4RateLimiter.v
Bounded refill, Monotone refill, Consume semantics (P1-P6), Refill idempotence at same timestamp, Consume tokens bounded by max, Refill chain monotonicity
10 theorems✓ Verified
F5Config.v + ConfigProofs.v
valid_port, valid_temperature, validate_config_full (P8-P15)
15 theorems✓ Verified
F6QuoteParsing.v + ShellSafety.v
Shell tokenizer correctness, Metachar blacklist completeness, Allowlist monotonicity
9 theorems✓ Verifiedextracted
F7SecretStore.v
Encrypt/decrypt identity, Nonce uniqueness bound, resolve_secret completeness
25 theorems✓ Verified
F8ChannelAuth.v
Shared allowlist semantics, Absolute-skew replay window, Telegram pairing validity
20 theorems✓ Verifiedextracted
F9AuditRetention.v
purge_by_count correctness, purge_by_age correctness, Purge preserves chain validity
12 theorems✓ Verified
F10AgentLoop.v
Termination in max_tool_iterations, Tool-call/tool-result history shape, Runtime-aligned tool-group sanitization, Compaction split repair prevents orphan tool-result prefixes, Compacted history remains replay-safe for OpenAI replay, Trim and emergency compaction ordering/idempotence
22 theorems✓ Verified
F11SessionIsolation.v
Cross-key non-interference, Same-key reset/create stability, get_or_create and store_message isolation, Compaction-persistence round-trip consistency, replace_session_messages cross-key isolation
18 theorems✓ Verified
F12LandlockPolicy.v
Least-privilege access flags, Path set closure, Policy monotonicity
12 theorems✓ Verified

Module Breakdown

F1: Configuration Validation15/15 verified

Source: coq/theories/Clawq/ConfigProofs.v

Theorem 1.1 (valid_port_bounds) verified

For all p, valid_port p = true implies 1 <= p <= 65535.

Proof: ConfigProofs.v

Theorem 1.2 (default_config_valid) verified

validate_config default_config = true. The built-in defaults always pass validation.

Proof: ConfigProofs.v

Theorem 1.3 (weight_sum_invariant) verified

Provider weights sum to a consistent total; no weight is negative.

Proof: ConfigProofs.v

Theorem 1.4 (valid_temperature_bounds) verified

For all t, valid_temperature t = true implies 0.0 <= t <= 2.0.

Proof: ConfigProofs.v

Theorem 1.5 (secure_by_default) verified

The default security configuration enables sandboxing, rate limiting, and audit logging.

Proof: ConfigProofs.v

F2: Directory Traversal Immunity19/19 verified

Source: coq/theories/Clawq/PathSafety.v — Extracted to OCaml and integrated into tools_builtin.ml

Theorem 2.1 (normalize_no_dotdot) verified

Path normalization eliminates all ’..’ segments. The output of normalize never contains a dotdot component.

Proof: PathSafety.v

Theorem 2.2 (normalize_idempotent) verified

Normalizing an already-normalized path yields the same path: normalize (normalize p) = normalize p.

Proof: PathSafety.v

Theorem 2.3 (is_path_safe_segs_refl) verified

A path whose segments are all safe is itself safe.

Proof: PathSafety.v

Theorem 2.4 (workspace_containment) verified

Any path that passes the safety check is a prefix of the allowed workspace root. No file access escapes the sandbox.

Proof: PathSafety.v

F3: HMAC Audit Chain Integrity7/7 verified

Source: coq/theories/Clawq/AuditChain.v

Theorem 3.1 (verify_chain_append) verified

Appending a correctly-linked entry to a valid chain preserves chain validity.

Proof: AuditChain.v

Theorem 3.2 (build_chain_valid) verified

A chain produced by build_chain always verifies. Construction implies validity.

Proof: AuditChain.v

Theorem 3.3 (suffix_monotonicity) verified

Any suffix of a valid chain is itself valid. Truncation preserves integrity.

Proof: AuditChain.v

Theorem 3.4 (last_sig_app) verified

The last signature of a concatenated chain equals the last signature of the appended segment.

Proof: AuditChain.v

F4: Token Bucket Rate Limiter10/10 verified

Source: coq/theories/Clawq/RateLimiter.v

Theorem 4.1 (bounded_refill) verified

After refill, tokens never exceed the bucket capacity. The bucket cannot overflow.

Proof: RateLimiter.v

Theorem 4.2 (monotone_refill) verified

Refill is monotonically non-decreasing: tokens after refill >= tokens before refill.

Proof: RateLimiter.v

Theorem 4.3 (consume_semantics) verified

Properties P1–P6: consume decrements by exactly the requested amount when sufficient tokens exist; returns the bucket unchanged when insufficient; the allowed/denied decision is consistent with the token count.

Proof: RateLimiter.v

F5: Extended Configuration Validation15/15 verified

Source: coq/theories/Clawq/Config.v + ConfigProofs.v

Theorem 5.1 (valid_port) verified

Port validation is both sound and complete with respect to the 1–65535 range.

Proof: ConfigProofs.v

Theorem 5.2 (valid_temperature) verified

Temperature validation enforces the 0.0–2.0 range with correct boundary behavior.

Proof: ConfigProofs.v

Theorem 5.3 (validate_config_full) verified

Full configuration validation (P8–P15) checks all fields: port, temperature, max_tokens, history depth, provider weights, security flags, and rate limits.

Proof: ConfigProofs.v

F6: Shell Safety9/9 verified

Source: coq/theories/Clawq/QuoteParsing.v + ShellSafety.v — Extracted to OCaml

Theorem 6.1 (tokenizer_correctness) verified

split_command_words produces a list of tokens with no embedded shell metacharacters when the input passes is_shell_safe.

Proof: QuoteParsing.v

Theorem 6.2 (blacklist_completeness) verified

The character blacklist (;&|${">"}{"<"}!) is complete for the threat model: no injection is possible via unlisted characters in a POSIX shell without expansion.

Proof: ShellSafety.v

Theorem 6.3 (allowlist_monotonicity) verified

is_allowed cmd allowlist = true implies is_allowed cmd (allowlist ++ extra) = true. Extending the allowlist never revokes existing permissions.

Proof: ShellSafety.v

F7: Encryption Correctness25/25 verified

Source: coq/theories/Clawq/SecretStore.v — Spec-only (crypto axiomatized)

Theorem 7.1 (decrypt_secret_plaintext_passthrough) verified

Plaintext values (no $ENC: prefix) pass through decrypt unchanged.

Proof: SecretStore.v

Theorem 7.2 (resolve_secret_completeness) verified

resolve_secret handles all three cases: encrypted ($ENC:), environment variable ($VAR), and plaintext — with correct behavior for missing master key and decrypt failure.

Proof: SecretStore.v

Theorem 7.3 (encrypt_provider_key_avoids_double_encryption) verified

Already-encrypted values ($ENC: prefix) are never re-encrypted. Environment variable references ($VAR) are preserved unchanged.

Proof: SecretStore.v

F8: Channel Authentication20/20 verified

Source: coq/theories/Clawq/ChannelAuth.v — Extracted to OCaml

Theorem 8.1 (is_allowed_correct) verified

Allowlist membership is bidirectional: is_allowed id list = true iff id is in the list or the list is the wildcard [”*”].

Proof: ChannelAuth.v

Theorem 8.2 (timestamp_ok_enforces_window) verified

The replay prevention window enforces a 300-second absolute skew bound. Requests outside this window are rejected.

Proof: ChannelAuth.v

Theorem 8.3 (pairing_active_iff) verified

Telegram pairing is active strictly before expiry and inactive at or after expiry. The biconditional fully characterizes the validity window.

Proof: ChannelAuth.v

F9: Audit Retention Safety12/12 verified

Source: coq/theories/Clawq/AuditRetention.v

Theorem 9.1 (purge_preserves_validity) verified

Combined count-and-age purge preserves chain validity when anchored with the correct inherited prev_sig from the dropped prefix.

Proof: AuditRetention.v

Theorem 9.2 (purge_by_count_suffix) verified

Count-based retention produces a suffix of the original chain. The result length is bounded by the retention count.

Proof: AuditRetention.v

F10: Agent Loop Termination22/22 verified

Source: coq/theories/Clawq/AgentLoop.v

Theorem 10.1 (loop_terminates) verified

The agent turn loop terminates within max_tool_iterations steps. Fuel decreases monotonically.

Proof: AgentLoop.v

Theorem 10.2 (trim_history_idempotent) verified

Trimming an already-trimmed history is a no-op: trim(trim(h)) = trim(h).

Proof: AgentLoop.v

Theorem 10.3 (trim_history_length) verified

After trim_history, the history length is bounded by effective_max_messages.

Proof: AgentLoop.v

Theorem 10.4 (ensure_tool_group_integrity_replay_safe) verified

The runtime-aligned sanitizer removes replay-unsafe tool/result pairs, so every retained tool result and assistant tool call still has a matching counterpart in the source history.

Proof: AgentLoop.v

Theorem 10.5 (adjust_split_for_tool_groups_no_tool_result_prefix) verified

Compaction split repair guarantees the kept suffix does not begin with an orphan tool result after boundary adjustment.

Proof: AgentLoop.v

Theorem 10.6 (compacted_history_replay_safe) verified

After split repair, sanitization, and summary insertion, the compacted history remains replay-safe for OpenAI/Codex tool-call replay.

Proof: AgentLoop.v

F11: Session Isolation18/18 verified

Source: coq/theories/Clawq/SessionIsolation.v

Theorem 11.1 (store_message_isolated) verified

Storing a message to session A does not affect session B’s history (for A != B).

Proof: SessionIsolation.v

Theorem 11.2 (get_or_create_preserves_other) verified

Creating or retrieving session A leaves all other sessions unchanged.

Proof: SessionIsolation.v

Theorem 11.3 (reset_session_idempotent) verified

Resetting a session twice is equivalent to resetting it once.

Proof: SessionIsolation.v

F12: Landlock Sandbox Policy12/12 verified

Source: coq/theories/Clawq/LandlockPolicy.v — all bitwise arithmetic lemmas are fully proved

Theorem 12.1 (least_privilege_invariant) verified

Adding a rule preserves the least-privilege property: no path class receives more access than its minimal permission.

Proof: LandlockPolicy.v

Theorem 12.2 (sandbox_always_returns) verified

Sandbox initialization always returns a result, whether Landlock is available or not. Graceful degradation is guaranteed.

Proof: LandlockPolicy.v

Trust Boundary

Machine-Checked (Coq)
Coq Core
380 theorem/lemma statements
—▸
Extraction
Coq 8.19
—▸
clawq_core.ml
Verified OCaml
Trust Boundary
Unverified Runtime
Runtime (OCaml)
I/O
Network
LLM Providers

Verification Roadmap

All 12 verification phases (F1–F12) are complete, with 380 theorem/lemma statements across 13 Coq files. Both P5 (Formal Verification Expansion) and P6 (Gap Closure) are complete. The next wave targets new verification domains: tool safety, config merge, MCP framing, and SecretStore axiom reduction. See the Formal Verification Roadmap for proof-expansion history and the Formal Verification Audit for the gap analysis.

Gap Closure (P6) — Complete

P6 closed the distance between proved specifications and the shipped runtime across three milestones (14 tasks):

  • Extraction Wiring: Connected extracted Coq functions (validate_config_full, ChannelAuth.is_allowed, path/shell safety) into runtime paths with conformance tests.
  • Audit Chain Extraction: Instantiated the abstract AuditChain functor with concrete SHA256/HMAC-SHA256, extracted and wired verify_chain/make_entry into src/audit.ml.
  • Runtime Invariant Enforcement: Added runtime assertions for RateLimiter bounded-refill, AgentLoop history-length bounds, and conformance test suites exercising all spec-only modules. Discharged all admitted bitwise lemmas in LandlockPolicy.v.

Next Wave: Domain Expansion

Phases F13–F22 are now complete, covering tool invocation safety, config merge semantics, MCP protocol framing, sandbox policy, pair coding protocol, model format parsing, task tree status machine, cron scheduling, and Discord gateway protocol. The next targets are closing the spec-implementation gap: wiring extracted functions into runtime, adding conformance tests, and proving the remaining McpFraming Admitted theorem.

Extraction remains selective and deferred unless a pure kernel is stable and demonstrably worth replacing in shipped OCaml.

FAQ: Should Every Formalization Be Extracted?

No. We aim to extract the right pure kernels, not every formalization by default.

Extract when the Coq artifact is a small pure computation, matches shipped runtime behavior closely, and is actually worth making the runtime’s single source of truth.

Keep a theory spec-only when it mostly models trusted boundaries or effects — like crypto, databases, locking, OS policy, or higher-level control flow — or when the proof model still diverges from the OCaml implementation.

Today the best extraction candidates are things like Cli, Config, PathSafety, QuoteParsing, selective ShellSafety, and eventually some AuditChain or RateLimiter helpers. Modules like SecretStore, SessionIsolation, AgentLoop, and most of AuditRetention are higher-value as proofs first, extraction later.

Priority Matrix

PhaseModuleExtract?Security ROIDifficultyOrder
F13ToolSafety.v24 theoremsVery HighMediumDone
F14ConfigMerge.v12 theoremsHighMediumDone
F15McpFraming.v30 theoremsMedium-HighMediumDone
F16MemoryIsolation.vNot startedMedium-HighMediumPlanned
F17SandboxPolicy.v16 theoremsMediumLowDone
F18PairCoding.v34 theoremsLowMediumDone
F19PmodelParsing.v13 theoremsLowLowDone
F20TaskTree.v20 theoremsLowLowDone
F21SchedulerCron.v20 theoremsLowLowDone
F22DiscordGateway.v20 theoremsMediumMediumDone

Completed Phases (F13–F22)

F13: Tool Invocation Safety — 24 theorems proving authorization requirements by risk level, allowlist/authorization monotonicity, risk ordering (reflexive, transitive, antisymmetric, total), chain safety, and invocation safety composition.

F14: Config Merge Semantics — 12 theorems proving default preservation for missing JSON fields, security field floor enforcement, and single-application of resolve_secret per API key.

F15: MCP Protocol Framing — 30 theorems proving JSON-RPC request/response ID pairing, method dispatch correctness, Content-Length framing length, and response construction. 1 Admitted (parse round-trip needs concrete string_sub axioms).

F17: Sandbox Policy — 16 theorems proving backend detection fallback ordering (Firejail > Bubblewrap > NoSandbox), command wrapping requires both isolation mode and real backend, workspace exclusion from extra paths, and filtered path bounds.

F18: Pair Coding State Machine — 34 theorems proving phase level monotonicity, review round monotonicity, completion requires approval or max rounds, Done is terminal, abort/timeout always succeed, review entry clears approvals, and per-phase transition classification.

F19: Model Format Parsing — 13 theorems proving canonical format always converts, bare format requires default provider, deprecation warning iff non-canonical, and conversion preserves model field.

F20: Task Tree Status Machine — 20 theorems proving terminal states have no outgoing transitions, actionable states can be cancelled, Task_error is stuck, transitions advance status order, no self-transitions, and pending reaches terminal in at most 2 steps.

F21: Cron Field Matching — 20 theorems proving wildcard matches all values, non-wildcard field_matches iff membership, valid field values within range, all-wildcard cron matches any time, and field matching monotonicity.

F22: Discord Gateway Protocol — 20 theorems proving Hello starts heartbeat, READY establishes session, reconnect preserves session for resumption, invalid session (fresh) clears state, heartbeat timeout disconnects, and full connect/reconnect lifecycle correctness.

Remaining Planned

F16: Memory Store Isolation — Spec-only proof of cross-session non-interference for memory store/recall/forget operations. Prove key-scoped search returns only results from the querying session’s namespace.


Data sourced from docs/src/data/formal_verification.yml and generated into docs/src/data/fv-stats.json by make fv-all