Verification Status

A machine-checked account of Clawq’s proven properties.

Last verification refresh: 2026-03-07 · Coq 8.19 · 153 theorem/lemma statements tracked across public phases F1–F12

Planning note. The current proof-expansion plan lives in Formal Verification Roadmap - 2026-03-07.

FV

Coq-Verified Core

0
0
0

Statements Tracked

0

Modules Extracted

0
0
0
0

Lines of Coq

78/ 163

Tracked Theorem/Lemma Statements

71 fully verified82 in progress85 planned

Phase Status

F1Config✓ Verified
F2PathSafety✓ Verified
F3AuditChain✓ Verified
F4RateLimiter✓ Verified
F5ConfigExt✓ Verified
F6ShellSafety✓ Verified
F7SecretStore⚡ In Progress
F8ChannelAuth⚡ In Progress
F9AuditRetention⚡ In Progress
F10AgentLoop⚡ In Progress
F11Session⚡ In Progress
F12Landlock⚡ In Progress

Verification Ledger

F1ConfigProofs.v
Weight sum invariants, port/temperature validation, secure-by-default
15 theorems✓ Verified
F2PathSafety.v
normalize_no_dotdot, idempotence, workspace containment
19 theorems✓ Verifiedextracted
F3AuditChain.v
verify_chain_append, build_chain validity, suffix monotonicity
7 theorems✓ Verified
F4RateLimiter.v
Bounded refill, monotone refill, consume semantics (P1–P6)
6 theorems✓ Verified
F5Config.v + ConfigProofs.v
valid_port, valid_temperature, validate_config_full (P8–P15)
15 theorems✓ Verified
F6QuoteParsing.v + ShellSafety.v
Tokenizer correctness, metachar blacklist completeness, allowlist monotonicity
9 theorems✓ Verifiedextracted
F7SecretStore.v
Encrypted/plain/env resolution, missing-key/decrypt-failure modeling, config rewrite preservation
12 statements⚡ In Progress
F8ChannelAuth.v
Shared allowlist semantics, absolute-skew freshness, Telegram pairing validity window
18 statements⚡ In Progress
F9AuditRetention.v
Anchored retention, suffix-preserving age purge assumptions, combined purge validity
12 statements⚡ In Progress
F10AgentLoop.v
Fuel-bounded tool-iteration model, tool-call/tool-result cycle shape, trim ordering/idempotence
10 statements⚡ In Progress
F11SessionIsolation.v
Cross-key isolation plus same-key reset/create idempotence and stability
18 statements⚡ In Progress
F12LandlockPolicy.v
Least-privilege access flags, path-set closure, policy monotonicity (admits remain)
12 statements⚡ In Progress

Module Breakdown

F1: Configuration Validation15/15 verified

Source: coq/theories/Clawq/ConfigProofs.v

Theorem 1.1 (valid_port_bounds) verified

For all p, valid_port p = true implies 1 <= p <= 65535.

Proof: ConfigProofs.v

Theorem 1.2 (default_config_valid) verified

validate_config default_config = true. The built-in defaults always pass validation.

Proof: ConfigProofs.v

Theorem 1.3 (weight_sum_invariant) verified

Provider weights sum to a consistent total; no weight is negative.

Proof: ConfigProofs.v

Theorem 1.4 (valid_temperature_bounds) verified

For all t, valid_temperature t = true implies 0.0 <= t <= 2.0.

Proof: ConfigProofs.v

Theorem 1.5 (secure_by_default) verified

The default security configuration enables sandboxing, rate limiting, and audit logging.

Proof: ConfigProofs.v

F2: Directory Traversal Immunity19/19 verifiedextracted

Source: coq/theories/Clawq/PathSafety.v — Extracted to OCaml and integrated into tools_builtin.ml

Theorem 2.1 (normalize_no_dotdot) verified

Path normalization eliminates all ’..’ segments. The output of normalize never contains a dotdot component.

Proof: PathSafety.v

Theorem 2.2 (normalize_idempotent) verified

Normalizing an already-normalized path yields the same path: normalize (normalize p) = normalize p.

Proof: PathSafety.v

Theorem 2.3 (is_path_safe_segs_refl) verified

A path whose segments are all safe is itself safe.

Proof: PathSafety.v

Theorem 2.4 (workspace_containment) verified

Any path that passes the safety check is a prefix of the allowed workspace root. No file access escapes the sandbox.

Proof: PathSafety.v

F3: HMAC Audit Chain Integrity7/7 verified

Source: coq/theories/Clawq/AuditChain.v

Theorem 3.1 (verify_chain_append) verified

Appending a correctly-linked entry to a valid chain preserves chain validity.

Proof: AuditChain.v

Theorem 3.2 (build_chain_valid) verified

A chain produced by build_chain always verifies. Construction implies validity.

Proof: AuditChain.v

Theorem 3.3 (suffix_monotonicity) verified

Any suffix of a valid chain is itself valid. Truncation preserves integrity.

Proof: AuditChain.v

Theorem 3.4 (last_sig_app) verified

The last signature of a concatenated chain equals the last signature of the appended segment.

Proof: AuditChain.v

F4: Token Bucket Rate Limiter6/6 verified

Source: coq/theories/Clawq/RateLimiter.v

Theorem 4.1 (bounded_refill) verified

After refill, tokens never exceed the bucket capacity. The bucket cannot overflow.

Proof: RateLimiter.v

Theorem 4.2 (monotone_refill) verified

Refill is monotonically non-decreasing: tokens after refill >= tokens before refill.

Proof: RateLimiter.v

Theorem 4.3 (consume_semantics) verified

Properties P1–P6: consume decrements by exactly the requested amount when sufficient tokens exist; returns the bucket unchanged when insufficient; the allowed/denied decision is consistent with the token count.

Proof: RateLimiter.v

F5: Extended Configuration Validation15/15 verified

Source: coq/theories/Clawq/Config.v + ConfigProofs.v

Theorem 5.1 (valid_port) verified

Port validation is both sound and complete with respect to the 1–65535 range.

Proof: ConfigProofs.v

Theorem 5.2 (valid_temperature) verified

Temperature validation enforces the 0.0–2.0 range with correct boundary behavior.

Proof: ConfigProofs.v

Theorem 5.3 (validate_config_full) verified

Full configuration validation (P8–P15) checks all fields: port, temperature, max_tokens, history depth, provider weights, security flags, and rate limits.

Proof: ConfigProofs.v

F6: Shell Safety9/9 verifiedextracted

Source: coq/theories/Clawq/QuoteParsing.v + ShellSafety.v — Extracted to OCaml

Theorem 6.1 (tokenizer_correctness) verified

split_command_words produces a list of tokens with no embedded shell metacharacters when the input passes is_shell_safe.

Proof: QuoteParsing.v

Theorem 6.2 (blacklist_completeness) verified

The character blacklist (;&|${">"}{"<"}!) is complete for the threat model: no injection is possible via unlisted characters in a POSIX shell without expansion.

Proof: ShellSafety.v

Theorem 6.3 (allowlist_monotonicity) verified

is_allowed cmd allowlist = true implies is_allowed cmd (allowlist ++ extra) = true. Extending the allowlist never revokes existing permissions.

Proof: ShellSafety.v

Trust Boundary

Machine-Checked (Coq)
Coq Core
163 theorems
—▸
Extraction
Coq 8.19
—▸
clawq_core.ml
Verified OCaml
Trust Boundary
Unverified Runtime
Runtime (OCaml)
I/O
Network
LLM Providers

Verification Roadmap

The remaining phase work (F7—F12) is prioritized by runtime-risk reduction and model/runtime alignment. Extraction is selective and deferred unless a pure kernel is stable and demonstrably worth replacing in shipped OCaml.

FAQ: Should Every Formalization Be Extracted?

No. We aim to extract the right pure kernels, not every formalization by default.

Extract when the Coq artifact is a small pure computation, matches shipped runtime behavior closely, and is actually worth making the runtime’s single source of truth.

Keep a theory spec-only when it mostly models trusted boundaries or effects — like crypto, databases, locking, OS policy, or higher-level control flow — or when the proof model still diverges from the OCaml implementation.

Today the best extraction candidates are things like Cli, Config, PathSafety, QuoteParsing, selective ShellSafety, and eventually some AuditChain or RateLimiter helpers. Modules like SecretStore, SessionIsolation, AgentLoop, and most of AuditRetention are higher-value as proofs first, extraction later.

Priority Matrix

PhaseModuleExtract?Security ROIDifficultyOrder
F7SecretStore.vNo (spec)HighMedium1st
F8ChannelAuth.vCandidate later (allowlist kernels only)HighLow2nd
F9AuditRetention.vCandidate later (pure purge helpers only)Medium-HighLow3rd
F10AgentLoop.vNo (spec)MediumLow4th
F11SessionIsolation.vNo (spec)Medium-HighMedium5th
F12LandlockPolicy.vNo (spec)MediumMedium-High6th

Planned Phases

F7: Encryption Correctness — Spec-only proofs for secret_store.ml. Prove encrypt/decrypt is identity (modelling AES-GCM as abstract parameters), nonce uniqueness birthday-bound, and resolve_secret completeness across encrypted, env-var, and plaintext cases.

F8: Channel Authentication — Keep formal/runtime alignment current for shared allowlist, replay/freshness semantics, and Telegram pairing validity. Treat extraction as optional follow-through once kernels are stable.

F9: Audit Retention Safety — Maintain anchored-retention correctness under runtime assumptions (suffix-preserving age purge + count purge). Extraction remains optional and limited to pure helper kernels.

F10: Agent Loop Termination — Spec-only proof that the agent turn loop stays within max_tool_iterations, preserves the tool-call/tool-result history shape, and keeps newest-first ordering under trim_history and emergency compaction. Models the loop as a Coq Fixpoint with fuel.

F11: Session Isolation — Spec-only proof using Coq FMap for cross-key non-interference plus same-key reset/create stability and idempotence; runtime lock behavior remains modeled with explicit assumptions and review notes.

F12: Landlock Sandbox Policy — Spec-only proof of least-privilege access flags, path set closure under extra_paths extension, and policy monotonicity. Assumes C FFI stubs implement Landlock semantics correctly.


Data sourced from the F1–F12 slice of docs/src/data/formal_verification.yml (with F0 tracked separately as internal CLI work) · Theorem counts checked by make update-fv