II. Formal Properties

Verification Status

A machine-checked account of Clawq’s proven properties.

Last verification refresh: 2026-03-17 · Coq 8.19 · 380 verified theorem/lemma statements across 21 phases

Status note. All 22 verification phases (F0–F22) are verified, with 374 theorem/lemma statements across 27 Coq files and only 1 Admitted proof remaining. Recent additions cover tool invocation safety (F13), config merge (F14), MCP framing (F15), sandbox policy (F17), pair coding (F18), model format parsing (F19), task tree status machine (F20), cron scheduling (F21), and Discord gateway protocol (F22). See Roadmap below.

Coq-Verified Core

Verified Statements

Modules Extracted

Lines of Coq

380/ 380

Repository Theorem/Lemma Statements

358 fully verified0 in progress0 planned

Phase Status

F1Config✓ Verified

F2PathSafety✓ Verified

F3AuditChain✓ Verified

F4RateLimiter✓ Verified

F5ConfigExt✓ Verified

F6ShellSafety✓ Verified

F7SecretStore✓ Verified

F8ChannelAuth✓ Verified

F9AuditRetention✓ Verified

F10AgentLoop✓ Verified

F11Session✓ Verified

F12Landlock✓ Verified

Verification Ledger

F1ConfigProofs.v

Weight sum invariants, Port/temperature range validation, Secure-by-default security config

15 theorems✓ Verified

F2PathSafety.v

normalize_no_dotdot, Normalization idempotence, is_path_safe_segs_refl, Workspace containment (prefix safety)

19 theorems✓ Verifiedextracted

F3AuditChain.v

verify_chain_append, build_chain validity, Suffix monotonicity, last_sig_app

7 theorems✓ Verified

F4RateLimiter.v

Bounded refill, Monotone refill, Consume semantics (P1-P6), Refill idempotence at same timestamp, Consume tokens bounded by max, Refill chain monotonicity

10 theorems✓ Verified

F5Config.v + ConfigProofs.v

valid_port, valid_temperature, validate_config_full (P8-P15)

15 theorems✓ Verified

F6QuoteParsing.v + ShellSafety.v

Shell tokenizer correctness, Metachar blacklist completeness, Allowlist monotonicity

9 theorems✓ Verifiedextracted

F7SecretStore.v

Encrypt/decrypt identity, Nonce uniqueness bound, resolve_secret completeness

25 theorems✓ Verified

F8ChannelAuth.v

Shared allowlist semantics, Absolute-skew replay window, Telegram pairing validity

20 theorems✓ Verifiedextracted

F9AuditRetention.v

purge_by_count correctness, purge_by_age correctness, Purge preserves chain validity

12 theorems✓ Verified

F10AgentLoop.v

Termination in max_tool_iterations, Tool-call/tool-result history shape, Runtime-aligned tool-group sanitization, Compaction split repair prevents orphan tool-result prefixes, Compacted history remains replay-safe for OpenAI replay, Trim and emergency compaction ordering/idempotence

22 theorems✓ Verified

F11SessionIsolation.v

Cross-key non-interference, Same-key reset/create stability, get_or_create and store_message isolation, Compaction-persistence round-trip consistency, replace_session_messages cross-key isolation

18 theorems✓ Verified

F12LandlockPolicy.v

Least-privilege access flags, Path set closure, Policy monotonicity

12 theorems✓ Verified

Module Breakdown

F1: Configuration Validation15/15 verified

Source: coq/theories/Clawq/ConfigProofs.v

Theorem 1.1 (valid_port_bounds) verified

For all p, valid_port p = true implies 1 <= p <= 65535.

Proof: ConfigProofs.v

Theorem 1.2 (default_config_valid) verified

validate_config default_config = true. The built-in defaults always pass validation.

Proof: ConfigProofs.v

Theorem 1.3 (weight_sum_invariant) verified

Provider weights sum to a consistent total; no weight is negative.

Proof: ConfigProofs.v

Theorem 1.4 (valid_temperature_bounds) verified

For all t, valid_temperature t = true implies 0.0 <= t <= 2.0.

Proof: ConfigProofs.v

Theorem 1.5 (secure_by_default) verified

The default security configuration enables sandboxing, rate limiting, and audit logging.

Proof: ConfigProofs.v

F2: Directory Traversal Immunity19/19 verified

Source: coq/theories/Clawq/PathSafety.v — Extracted to OCaml and integrated into tools_builtin.ml

Theorem 2.1 (normalize_no_dotdot) verified

Path normalization eliminates all ’..’ segments. The output of normalize never contains a dotdot component.

Proof: PathSafety.v

Theorem 2.2 (normalize_idempotent) verified

Normalizing an already-normalized path yields the same path: normalize (normalize p) = normalize p.

Proof: PathSafety.v

Theorem 2.3 (is_path_safe_segs_refl) verified

A path whose segments are all safe is itself safe.

Proof: PathSafety.v

Theorem 2.4 (workspace_containment) verified

Any path that passes the safety check is a prefix of the allowed workspace root. No file access escapes the sandbox.

Proof: PathSafety.v

F3: HMAC Audit Chain Integrity7/7 verified

Source: coq/theories/Clawq/AuditChain.v

Theorem 3.1 (verify_chain_append) verified

Appending a correctly-linked entry to a valid chain preserves chain validity.

Proof: AuditChain.v

Theorem 3.2 (build_chain_valid) verified

A chain produced by build_chain always verifies. Construction implies validity.

Proof: AuditChain.v

Theorem 3.3 (suffix_monotonicity) verified

Any suffix of a valid chain is itself valid. Truncation preserves integrity.

Proof: AuditChain.v

Theorem 3.4 (last_sig_app) verified

The last signature of a concatenated chain equals the last signature of the appended segment.

Proof: AuditChain.v

F4: Token Bucket Rate Limiter10/10 verified

Source: coq/theories/Clawq/RateLimiter.v

Theorem 4.1 (bounded_refill) verified

After refill, tokens never exceed the bucket capacity. The bucket cannot overflow.

Proof: RateLimiter.v

Theorem 4.2 (monotone_refill) verified

Refill is monotonically non-decreasing: tokens after refill >= tokens before refill.

Proof: RateLimiter.v

Theorem 4.3 (consume_semantics) verified

Properties P1–P6: consume decrements by exactly the requested amount when sufficient tokens exist; returns the bucket unchanged when insufficient; the allowed/denied decision is consistent with the token count.

Proof: RateLimiter.v

F5: Extended Configuration Validation15/15 verified

Source: coq/theories/Clawq/Config.v + ConfigProofs.v

Theorem 5.1 (valid_port) verified

Port validation is both sound and complete with respect to the 1–65535 range.

Proof: ConfigProofs.v

Theorem 5.2 (valid_temperature) verified

Temperature validation enforces the 0.0–2.0 range with correct boundary behavior.

Proof: ConfigProofs.v

Theorem 5.3 (validate_config_full) verified

Full configuration validation (P8–P15) checks all fields: port, temperature, max_tokens, history depth, provider weights, security flags, and rate limits.

Proof: ConfigProofs.v

F6: Shell Safety9/9 verified

Source: coq/theories/Clawq/QuoteParsing.v + ShellSafety.v — Extracted to OCaml

Theorem 6.1 (tokenizer_correctness) verified

split_command_words produces a list of tokens with no embedded shell metacharacters when the input passes is_shell_safe.

Proof: QuoteParsing.v

Theorem 6.2 (blacklist_completeness) verified

The character blacklist (;&|${">"}{"<"}!) is complete for the threat model: no injection is possible via unlisted characters in a POSIX shell without expansion.

Proof: ShellSafety.v

Theorem 6.3 (allowlist_monotonicity) verified

is_allowed cmd allowlist = true implies is_allowed cmd (allowlist ++ extra) = true. Extending the allowlist never revokes existing permissions.

Proof: ShellSafety.v

F7: Encryption Correctness25/25 verified

Source: coq/theories/Clawq/SecretStore.v — Spec-only (crypto axiomatized)

Theorem 7.1 (decrypt_secret_plaintext_passthrough) verified

Plaintext values (no $ENC: prefix) pass through decrypt unchanged.

Proof: SecretStore.v

Theorem 7.2 (resolve_secret_completeness) verified

resolve_secret handles all three cases: encrypted ($ENC:), environment variable ($VAR), and plaintext — with correct behavior for missing master key and decrypt failure.

Proof: SecretStore.v

Theorem 7.3 (encrypt_provider_key_avoids_double_encryption) verified

Already-encrypted values ($ENC: prefix) are never re-encrypted. Environment variable references ($VAR) are preserved unchanged.

Proof: SecretStore.v

F8: Channel Authentication20/20 verified

Source: coq/theories/Clawq/ChannelAuth.v — Extracted to OCaml

Theorem 8.1 (is_allowed_correct) verified

Allowlist membership is bidirectional: is_allowed id list = true iff id is in the list or the list is the wildcard [”*”].

Proof: ChannelAuth.v

Theorem 8.2 (timestamp_ok_enforces_window) verified

The replay prevention window enforces a 300-second absolute skew bound. Requests outside this window are rejected.

Proof: ChannelAuth.v

Theorem 8.3 (pairing_active_iff) verified

Telegram pairing is active strictly before expiry and inactive at or after expiry. The biconditional fully characterizes the validity window.

Proof: ChannelAuth.v

F9: Audit Retention Safety12/12 verified

Source: coq/theories/Clawq/AuditRetention.v

Theorem 9.1 (purge_preserves_validity) verified

Combined count-and-age purge preserves chain validity when anchored with the correct inherited prev_sig from the dropped prefix.

Proof: AuditRetention.v

Theorem 9.2 (purge_by_count_suffix) verified

Count-based retention produces a suffix of the original chain. The result length is bounded by the retention count.

Proof: AuditRetention.v

F10: Agent Loop Termination22/22 verified

Source: coq/theories/Clawq/AgentLoop.v

Theorem 10.1 (loop_terminates) verified

The agent turn loop terminates within max_tool_iterations steps. Fuel decreases monotonically.

Proof: AgentLoop.v

Theorem 10.2 (trim_history_idempotent) verified

Trimming an already-trimmed history is a no-op: trim(trim(h)) = trim(h).

Proof: AgentLoop.v

Theorem 10.3 (trim_history_length) verified

After trim_history, the history length is bounded by effective_max_messages.

Proof: AgentLoop.v

Theorem 10.4 (ensure_tool_group_integrity_replay_safe) verified

The runtime-aligned sanitizer removes replay-unsafe tool/result pairs, so every retained tool result and assistant tool call still has a matching counterpart in the source history.

Proof: AgentLoop.v

Theorem 10.5 (adjust_split_for_tool_groups_no_tool_result_prefix) verified

Compaction split repair guarantees the kept suffix does not begin with an orphan tool result after boundary adjustment.

Proof: AgentLoop.v

Theorem 10.6 (compacted_history_replay_safe) verified

After split repair, sanitization, and summary insertion, the compacted history remains replay-safe for OpenAI/Codex tool-call replay.

Proof: AgentLoop.v

F11: Session Isolation18/18 verified

Source: coq/theories/Clawq/SessionIsolation.v

Theorem 11.1 (store_message_isolated) verified

Storing a message to session A does not affect session B’s history (for A != B).

Proof: SessionIsolation.v

Theorem 11.2 (get_or_create_preserves_other) verified

Creating or retrieving session A leaves all other sessions unchanged.

Proof: SessionIsolation.v

Theorem 11.3 (reset_session_idempotent) verified

Resetting a session twice is equivalent to resetting it once.

Proof: SessionIsolation.v

F12: Landlock Sandbox Policy12/12 verified

Source: coq/theories/Clawq/LandlockPolicy.v — all bitwise arithmetic lemmas are fully proved

Theorem 12.1 (least_privilege_invariant) verified

Adding a rule preserves the least-privilege property: no path class receives more access than its minimal permission.

Proof: LandlockPolicy.v

Theorem 12.2 (sandbox_always_returns) verified

Sandbox initialization always returns a result, whether Landlock is available or not. Graceful degradation is guaranteed.

Proof: LandlockPolicy.v

Trust Boundary

Machine-Checked (Coq)

Coq Core

380 theorem/lemma statements

—▸

Extraction

Coq 8.19

—▸

clawq_core.ml

Verified OCaml

Trust Boundary

Unverified Runtime

Runtime (OCaml)

I/O

Network

LLM Providers

Verification Roadmap

All 12 verification phases (F1–F12) are complete, with 380 theorem/lemma statements across 13 Coq files. Both P5 (Formal Verification Expansion) and P6 (Gap Closure) are complete. The next wave targets new verification domains: tool safety, config merge, MCP framing, and SecretStore axiom reduction. See the Formal Verification Roadmap for proof-expansion history and the Formal Verification Audit for the gap analysis.

Gap Closure (P6) — Complete

P6 closed the distance between proved specifications and the shipped runtime across three milestones (14 tasks):

Extraction Wiring: Connected extracted Coq functions (validate_config_full, ChannelAuth.is_allowed, path/shell safety) into runtime paths with conformance tests.
Audit Chain Extraction: Instantiated the abstract AuditChain functor with concrete SHA256/HMAC-SHA256, extracted and wired verify_chain/make_entry into src/audit.ml.
Runtime Invariant Enforcement: Added runtime assertions for RateLimiter bounded-refill, AgentLoop history-length bounds, and conformance test suites exercising all spec-only modules. Discharged all admitted bitwise lemmas in LandlockPolicy.v.

Next Wave: Domain Expansion

Phases F13–F22 are now complete, covering tool invocation safety, config merge semantics, MCP protocol framing, sandbox policy, pair coding protocol, model format parsing, task tree status machine, cron scheduling, and Discord gateway protocol. The next targets are closing the spec-implementation gap: wiring extracted functions into runtime, adding conformance tests, and proving the remaining McpFraming Admitted theorem.

Extraction remains selective and deferred unless a pure kernel is stable and demonstrably worth replacing in shipped OCaml.

FAQ: Should Every Formalization Be Extracted?

No. We aim to extract the right pure kernels, not every formalization by default.

Extract when the Coq artifact is a small pure computation, matches shipped runtime behavior closely, and is actually worth making the runtime’s single source of truth.

Keep a theory spec-only when it mostly models trusted boundaries or effects — like crypto, databases, locking, OS policy, or higher-level control flow — or when the proof model still diverges from the OCaml implementation.

Today the best extraction candidates are things like Cli, Config, PathSafety, QuoteParsing, selective ShellSafety, and eventually some AuditChain or RateLimiter helpers. Modules like SecretStore, SessionIsolation, AgentLoop, and most of AuditRetention are higher-value as proofs first, extraction later.

Priority Matrix

Phase	Module	Extract?	Security ROI	Difficulty	Order
F13	`ToolSafety.v`	24 theorems	Very High	Medium	Done
F14	`ConfigMerge.v`	12 theorems	High	Medium	Done
F15	`McpFraming.v`	30 theorems	Medium-High	Medium	Done
F16	`MemoryIsolation.v`	Not started	Medium-High	Medium	Planned
F17	`SandboxPolicy.v`	16 theorems	Medium	Low	Done
F18	`PairCoding.v`	34 theorems	Low	Medium	Done
F19	`PmodelParsing.v`	13 theorems	Low	Low	Done
F20	`TaskTree.v`	20 theorems	Low	Low	Done
F21	`SchedulerCron.v`	20 theorems	Low	Low	Done
F22	`DiscordGateway.v`	20 theorems	Medium	Medium	Done

Completed Phases (F13–F22)

F13: Tool Invocation Safety — 24 theorems proving authorization requirements by risk level, allowlist/authorization monotonicity, risk ordering (reflexive, transitive, antisymmetric, total), chain safety, and invocation safety composition.

F14: Config Merge Semantics — 12 theorems proving default preservation for missing JSON fields, security field floor enforcement, and single-application of resolve_secret per API key.

F15: MCP Protocol Framing — 30 theorems proving JSON-RPC request/response ID pairing, method dispatch correctness, Content-Length framing length, and response construction. 1 Admitted (parse round-trip needs concrete string_sub axioms).

F17: Sandbox Policy — 16 theorems proving backend detection fallback ordering (Firejail > Bubblewrap > NoSandbox), command wrapping requires both isolation mode and real backend, workspace exclusion from extra paths, and filtered path bounds.

F18: Pair Coding State Machine — 34 theorems proving phase level monotonicity, review round monotonicity, completion requires approval or max rounds, Done is terminal, abort/timeout always succeed, review entry clears approvals, and per-phase transition classification.

F19: Model Format Parsing — 13 theorems proving canonical format always converts, bare format requires default provider, deprecation warning iff non-canonical, and conversion preserves model field.

F20: Task Tree Status Machine — 20 theorems proving terminal states have no outgoing transitions, actionable states can be cancelled, Task_error is stuck, transitions advance status order, no self-transitions, and pending reaches terminal in at most 2 steps.

F21: Cron Field Matching — 20 theorems proving wildcard matches all values, non-wildcard field_matches iff membership, valid field values within range, all-wildcard cron matches any time, and field matching monotonicity.

F22: Discord Gateway Protocol — 20 theorems proving Hello starts heartbeat, READY establishes session, reconnect preserves session for resumption, invalid session (fresh) clears state, heartbeat timeout disconnects, and full connect/reconnect lifecycle correctness.

Remaining Planned

F16: Memory Store Isolation — Spec-only proof of cross-session non-interference for memory store/recall/forget operations. Prove key-scoped search returns only results from the querying session’s namespace.

Data sourced from docs/src/data/formal_verification.yml and generated into docs/src/data/fv-stats.json by make fv-all