Collection Metadata Corruption After Unclean Shutdown
criticalChroma's SQLite metadata store is vulnerable to corruption if the process terminates abnormally (crash, SIGKILL, power loss) during a write transaction. Corruption manifests as collection initialization failures, missing collections, inconsistent vector counts, or foreign key constraint violations. This is more likely in embedded deployments without WAL mode enabled.
Collection operations fail after unclean process termination (OOM kill, crash, forced shutdown). Errors include 'database disk image is malformed', 'no such table', foreign key violations, or collections appearing empty despite prior data. SQLite integrity_check fails.
1. Detect corruption: Run SQLite integrity check on startup: `sqlite3 chroma.db 'PRAGMA integrity_check;'`. Check for error patterns in logs indicating database issues. 2. Prevent: Enable SQLite WAL (Write-Ahead Logging) mode which provides better crash resilience: `PRAGMA journal_mode=WAL;`. Ensure graceful shutdown procedures (handle SIGTERM, not SIGKILL). Implement regular backups. 3. Recover: If corruption detected, restore from latest backup. If no backup, attempt SQLite recovery: `sqlite3 chroma.db '.recover' | sqlite3 recovered.db`, then validate recovered data. 4. Re-initialize: If recovery fails, delete corrupted database and rebuild collections from source data. 5. Monitoring: Track unclean shutdowns (OOM kills, crashes), integrity check results on startup, backup recency. Alert on any corruption detection.