Architecture

Clawq follows a Coq-first architecture: core properties are machine-checked in Coq, extracted to OCaml, and wrapped by a runtime that provides I/O, networking, and channel integrations.

Build Pipeline

flowchart TD COQ["coq/theories/Clawq/*.v Coq source theories"] EXTRACT["coqc + Extraction scripts/extract.sh"] EXTRACTED["src/extracted/clawq_core.ml · .mli Generated OCaml · tracked in git"] LIB_EXT["clawq_extracted src/extracted/dune"] LIB_CORE["clawq_runtime_core config · agent · session · provider memory · tools · audit · rate-limiter"] LIB_INT["clawq_runtime_integrations channels · daemon · gateway · MCP WS · tunnels · HTTP server"] EXE["clawq executable main.ml + cmdliner"] COQ --> EXTRACT EXTRACT --> EXTRACTED EXTRACTED --> LIB_EXT LIB_EXT --> LIB_CORE LIB_CORE --> LIB_INT LIB_CORE --> EXE LIB_INT --> EXE

Coq theories → extracted OCaml → runtime libraries → clawq executable

The extracted OCaml is tracked in git so the project builds without Coq installed. Run make extract to regenerate from Coq sources, and make extract-check to verify no drift.

Coq Theories

File	Role
`Interfaces.v`	7 record-based interface definitions: Provider, Channel, Tool, Memory, RuntimeAdapter, Tunnel, Security
`Config.v`	Configuration records (GatewayConfig, MemoryConfig, SecurityConfig, ClawqConfig) with defaults and validation
`Cli.v`	Command ADT, `parse_command`, `dispatch`, and `usage` string for 18 commands
`Extract.v`	Extraction directives: type mappings and function list

Extracted OCaml

File	Role
`clawq_core.ml`	Auto-generated OCaml from Coq. Contains command parsing, dispatch, config validation.
`clawq_core.mli`	Auto-generated interface. Both files are tracked in git.

Runtime Module Map

Module	Role
`main.ml`	Entry point; Cmdliner CLI argument parsing
`command_bridge.ml`	Bridges CLI to extracted Coq dispatch; handles runtime-only commands
`runtime_config.ml`	Configuration loading, validation, and defaults for all subsystems
`config_loader.ml`	File-based config loading: reads JSON, merges with env vars
`agent.ml`	Agent loop: prompt assembly, provider calls, tool dispatch, conversation management
`session.ml`	Session lifecycle: create, resume, persist conversation state
`provider.ml`	LLM provider abstraction: streaming text/thinking/tool events, model selection
`memory.ml`	Key-value memory backend: SQLite-backed store/recall/forget with namespace support
`vector.ml`	Local vector index: embedding storage, cosine similarity, hybrid FTS+vector merge
`tool.ml`	Tool type definitions and invocation framework with risk-level enforcement
`tool_registry.ml`	Dynamic tool registration: register, lookup, list by name or category
`tools_builtin.ml`	Built-in tool implementations: file I/O, shell exec, web fetch, search
`mcp_server.ml`	MCP server: exposes tools over JSON-RPC with configurable filtering
`skills.ml`	Skill loader: discovers and loads skill definitions from the filesystem
`http_server.ml`	HTTP server: gateway, SSE chat streaming, pairing, slash-command metadata, and web UI endpoints
`ui_server.ml`	Web UI asset manager: embedded bundle extraction, dev-mode disk serving, and version hashing
`http_client.ml`	HTTP client: shared Cohttp-lwt client for provider and API calls
`telegram.ml`	Telegram channel: bot API integration via long polling
`discord.ml`	Discord channel: WS gateway mode, REST rate limit buckets, reconnect loop
`discord_gateway.ml`	Discord gateway protocol state machine (Hello, Identify, Resume, Heartbeat, Dispatch)
`slack.ml`	Slack Events API channel with HMAC verification
`slack_socket.ml`	Slack Socket Mode: WSS-based event receiving via app_token
`ws_client.ml`	Shared TLS WebSocket client (httpun-ws, gluten, tls-lwt, ca-certs)
`daemon.ml`	Supervisor: gateway + telegram + discord + slack_socket, signal handling, periodic cleanup, config file watcher (10s mtime poll + SIGHUP), EC process lifecycle
`error_watcher.ml`	Error correction shared types, helpers, and EC process lifecycle management
`ec_process.ml`	EC process entry point: daemon log/session/background-task error scanning, correlation
`ec_diagnosis.ml`	Multi-model diagnosis pipeline: parallel LLM queries, voting, planning, fix spawning
`service.ml`	Service orchestrator: starts/stops subsystems (server, tunnel, scheduler)
`scheduler.ml`	Scheduled tasks: cron-like recurring job execution
`audit.ml`	Audit logging: append-only log of tool invocations and security events
`secret_store.ml`	Secret encryption at rest: AES-256-GCM, PBKDF2 key derivation, `$ENC:` prefix format
`migrate.ml`	Database migrations: versioned schema upgrades for SQLite stores
`resilience.ml`	Reliability policies: timeout, retry (exponential backoff), fallback
`runtime_native.ml`	Native runtime adapter: wraps daemon/service for start/stop/status/health
`runtime_docker.ml`	Docker runtime adapter: manages clawq in containers via docker CLI
`tunnel_manager.ml`	Unified tunnel lifecycle manager: apply/stop/restart, live reconfiguration, daemon hooks
`tunnel_cloudflare.ml`	Cloudflare tunnel: manages cloudflared process, extracts assigned URL
`rate_limiter.ml`	Token bucket rate limiter (IP, session, chat)
`landlock.ml`	Landlock OS sandboxing (C FFI)
`bg_shell.ml`	Background shell job registry and lifecycle (created when `shell_exec` is interrupted)
`tools_bg_shell.ml`	Background shell tools: `bg_shell_status`, `bg_shell_wait`, `bg_shell_result`
`slash_commands.ml`	Slash command parsing, dispatch, and per-connector rendering
`format_adapter.ml`	Connector-safe formatting dispatch (markdown tables, code blocks, HTML)
`table_format.ml`	Markdown and plaintext table rendering
`setup_common.ml`	Shared TUI helpers for setup wizards (prompts, ANSI, box drawing)
`config_wizard_model.ml`	Config wizard state machine types
`config_wizard_tui.ml`	Config wizard terminal I/O loop
`config_wizard_update.ml`	Config wizard state transitions
`model_discovery.ml`	API-based model catalog discovery (12h TTL cache)
`request_stats.ml`	Per-turn token usage and cost persistence
`cost_tracker.ml`	Model pricing table and cost calculation
`runner_framework.ml`	Common runner session tracking framework (session ID strategies, command generation)
`background_task.ml`	Background coding task lifecycle and recovery
`pmodel.ml`	Canonical `provider:model` format parsing (strict + flexible), deprecation warnings
`stt.ml`	Speech-to-text: audio transcription via Whisper-compatible API
`voice_transcription.ml`	Shared voice message handling: audio detection, music heuristic, validation, transcribe-with-progress

Dune Libraries

Library	Key Modules	Dependencies
`clawq_extracted`	`clawq_core`	None (unwrapped, `-w -39` for extraction artifacts)
`clawq_runtime_core`	Config, agent, session, provider, memory, tools, audit, security, rate limiter, scheduler	yojson, sqlite3, lwt, cohttp-lwt-unix, mirage-crypto, digestif, clawq_extracted
`clawq_runtime_integrations`	HTTP server, telegram, discord, slack, daemon, MCP, WS client, tunnels, runtimes	clawq_runtime_core + httpun-ws-lwt-unix, gluten-lwt-unix, ca-certs
`clawq` (executable)	`main`	clawq_runtime_core, clawq_runtime_integrations, cmdliner

Both libraries use (wrapped false) so modules are accessible directly (e.g., Clawq_core.dispatch rather than Clawq_extracted.Clawq_core.dispatch).

Interface Inventory

From Interfaces.v, these 7 records define the contract surface:

Interface	Fields	Purpose
`Provider`	name, complete, health	LLM provider abstraction
`Channel`	name, start, stop, send	Communication channel (web, telegram, etc.)
`Tool`	name, invoke, risk_level	Agent tool with risk classification
`Memory`	store, recall, forget	Key-value memory backend
`RuntimeAdapter`	name, start, stop	Runtime lifecycle management
`Tunnel`	name, start, status	Network tunnel (e.g., Cloudflare)
`Security`	workspace_only, audit_enabled, encrypt_secrets	Security policy flags

Dependency Direction

flowchart TD IFACE["Interfaces.v no deps"] CONF["Config.v String · List · Bool"] CLI["Cli.v String · List"] EXT["Extract.v Cli · Config · ExtrOcaml*"] IFACE --> CONF CONF --> CLI CLI --> EXT

Coq theory dependency order

flowchart TD A["clawq_extracted no deps"] B["clawq_runtime_core core runtime"] C["clawq_runtime_integrations channels · gateway · daemon"] D["clawq cmdliner"] A --> B --> C --> D B --> D

OCaml library dependency order

Runtime Split

The runtime is split into core and integrations to support a minimal build (clawq-min) that excludes network dependencies:

Optional integrations stay out of clawq_runtime_core
Network/server features belong in clawq_runtime_integrations
Integration-only commands return “disabled in minimal build” messages in command_bridge_min.ml
New dependencies are evaluated for core vs integration placement before linking

Web UI Surface

The browser chat UI is embedded into the daemon and served from the gateway root.

GET / serves index.html plus versioned chat.js and chat.css assets.
POST /chat/stream emits SSE events for reply deltas, thinking deltas, tool starts, streamed tool output, tool results, and final completion.
GET /commands exposes slash-command metadata for autocomplete.
GET /ui-version returns the current UI bundle version so the client can offer a reload banner.
POST /pair exchanges a 6-digit OTP for a bearer token when gateway pairing is enabled.

In normal mode the daemon extracts embedded assets into ~/.clawq/ui/ when the bundle hash changes. If ~/.clawq/ui/DEV exists, the daemon switches to disk-backed dev mode and recomputes the UI version from local files instead of overwriting them.