Claude Parser Architecture¶

System Overview¶

Claude Parser is a disaster recovery and analysis tool for Claude Code conversations. It treats every Claude API call as a "git commit" with a UUID, enabling Git-like navigation through conversation history.

Core Design Principles¶

LNCA (LLM-Native Composable Architecture)¶

<80 LOC per file: Every module stays under 80 lines for optimal LLM context
100% Framework Delegation: No custom loops, error handling, or data processing
Single Source of Truth: One file per feature, no duplication
Pydantic Schema Normalization: All JSONL variations handled by Pydantic models

Architecture Layers¶

┌─────────────────────────────────────────┐
│          CLI Layer (cg commands)        │
├─────────────────────────────────────────┤
│         Python API (SDK functions)      │
├─────────────────────────────────────────┤
│     Core Modules (Business Logic)       │
├─────────────────────────────────────────┤
│    Storage Layer (DuckDB + JSONL)       │
└─────────────────────────────────────────┘

Module Organization¶

`/claude_parser/cli/`¶

Git-like command interface, split for LOC compliance: - cg.py - Main orchestrator (34 lines) - cg_basic.py - Status, log commands (67 lines) - cg_advanced.py - Find, blame commands (69 lines) - cg_reflog.py - Reflog, show commands (80 lines) - cg_restore.py - Checkout, restore commands (61 lines) - cg_reset.py - Reset commands (63 lines)

`/claude_parser/loaders/`¶

Data loading with framework delegation: - session.py - Session loading via JSON (52 lines) - discovery.py - File discovery via pathlib (73 lines)

`/claude_parser/queries/`¶

DuckDB queries, one file per feature: - session_queries.py - Session SQL queries - find_queries.py - Find operations - blame_queries.py - Blame operations - reflog_queries.py - Reflog history - schema_models.py - Pydantic normalization

`/claude_parser/operations/`¶

File operations, split by responsibility: - core.py - Re-exports for compatibility (20 lines) - file_ops.py - File operations (42 lines) - diff_ops.py - Diff generation (35 lines) - restore_ops.py - Restoration logic (79 lines)

`/claude_parser/navigation/`¶

Message navigation and timeline: - core.py - Message filtering - timeline.py - UUID time travel - checkpoint.py - Recovery point detection

`/claude_parser/tokens/`¶

Token counting and billing: - core.py - Token operations - context.py - Context window calculation - billing.py - Cost estimation

`/claude_parser/analytics/`¶

Session analysis: - core.py - Core analytics - tools.py - Tool usage analysis - projects.py - Project context analysis

`/claude_parser/hooks/`¶

Claude Code hook system: - request.py - Hook request model - aggregator.py - Response aggregation - executor.py - Hook execution - api.py - Hook API functions - handlers.py - Pre/post tool handlers

`/claude_parser/storage/`¶

Data persistence: - engine.py - DuckDB engine (ONLY raw SQL execution)

Data Flow¶

1. JSONL Loading¶

JSONL File → JSON.loads() → Pydantic Model → Dict

2. Query Execution¶

User Command → CLI Parser → Query Module → DuckDB → Results

3. Schema Normalization¶

Raw JSONL → NormalizedMessage → Consistent Interface

Key Design Patterns¶

Framework Delegation¶

# GOOD - Framework handles everything
sessions = list(filter(None, (
    load_session(str(Path(path).expanduser()))
    for path in paths
)))

# BAD - Custom loop
sessions = []
for path in paths:
    session = load_session(path)
    if session:
        sessions.append(session)

Pydantic Schema Normalization¶

class NormalizedMessage(BaseModel):
    """Handles all JSONL schema variations"""
    uuid: Optional[str] = None
    type: Optional[str] = None
    timestamp: Optional[str] = None
    toolUseResult: Optional[Union[str, Dict[str, Any]]] = None

    @property
    def normalized_result(self) -> Dict[str, Any]:
        """Auto-normalize string or dict results"""
        if isinstance(self.toolUseResult, str):
            return json.loads(self.toolUseResult)
        return self.toolUseResult or {}

Single Source of Truth¶

One storage engine: storage/engine.py
One session loader: loaders/session.py
One discovery module: loaders/discovery.py
Each query type in its own file

Storage Architecture¶

JSONL Structure¶

{
  "uuid": "abc123def456",
  "type": "toolUseResult",
  "timestamp": "2024-01-15T10:30:00Z",
  "toolUseResult": {
    "type": "Write",
    "filePath": "/path/to/file.py",
    "content": "file content here"
  }
}

DuckDB Integration¶

Direct SQL queries on JSONL files
No intermediate database
Schema inference from JSON structure
Efficient for large conversation files

UUID System¶

Every Claude message gets a UUID, enabling: - Git-like commits: Each UUID is like a commit SHA - Time travel: Jump to any point in conversation - Partial matching: Use first 6-8 chars like git - Cross-session search: Find operations across all conversations

Error Handling¶

All errors handled by frameworks: - Typer: CLI validation and error messages - Pydantic: Data validation and normalization - DuckDB: SQL execution and file handling - Pathlib: File system operations

Performance Characteristics¶

Startup: ~1-2 seconds (DuckDB initialization)
Query time: <1 second for most operations
Memory: Streaming processing, no full file loading
Scalability: Handles GB-sized JSONL files efficiently

Extension Points¶

Adding New Commands¶

Create new file in /cli/ (must be <80 LOC)
Import in cg.py orchestrator
Use existing query modules or create new ones

Adding New Queries¶

Create file in /queries/ for your feature
Use DuckDB SQL with JSONL
Return plain dicts/lists

Adding New Operations¶

Create focused module in /operations/
Use framework delegation (no custom loops)
Re-export in core.py if needed

Testing Approach¶

Real data: Use actual JSONL files
No mocks: Test with real Claude sessions
Framework validation: Let Pydantic/Typer handle validation
Integration focus: Test complete workflows

Dependencies¶

Core dependencies (all framework delegation): - typer: CLI framework - rich: Terminal formatting - duckdb: JSONL querying - pydantic: Schema normalization - pathlib: File operations

Security Considerations¶

Read-only by default: Most operations don't modify files
Explicit restore: File restoration requires explicit commands
No credentials: Works with local JSONL files only
Path validation: Pathlib handles path security