Evidence-Based Debugging

Systematic debugging workflow combining Chain-of-Thought reasoning, grounding directives, and evidence requirements to prevent hallucination and force code retrieval.

The Prompt

```
$ERROR_DESCRIPTION
```

Use the code research to analyze the error above.

INVESTIGATE:

1. Read relevant source files and trace the code path
2. Examine error messages, stack traces, and logs
3. Identify the specific location of the failure
4. Understand the surrounding architecture and data flow

ANALYZE:

5. Compare expected vs actual behavior
6. Identify the root cause of the failure
7. Determine if related issues exist elsewhere

EXPLAIN:

Provide your root cause analysis with evidence:
- File paths and line numbers (`src/auth/jwt.ts:45-67`)
- Actual values from code (`port: 8080`)
- Specific identifiers (`validateJWT()`)
- Exact error messages

Then propose a fix.

Overview

Why evidence requirements prevent hallucination: "Provide evidence (file paths, line numbers, actual values)" is an explicit grounding directive—agents cannot provide that evidence without retrieving it from the codebase. Without evidence requirements, agents produce pattern completion from training data ("probably a database timeout"), not analysis. Evidence forces codebase reading, execution tracing, and concrete citations. INVESTIGATE/ANALYZE/EXPLAIN numbered steps implement Chain-of-Thought, forcing sequential analysis (can't jump to "root cause" without examining error context first). "Use the code research" is explicit retrieval directive—prevents relying on training patterns. Fenced code block preserves error formatting and prevents LLM from interpreting failure messages as instructions. Good evidence includes file paths with line numbers, actual variable/config values, specific function names, and complete stack traces—not vague assertions.

When to use—fresh context requirement: Production errors with stack traces/logs, unexpected behavior in specific scenarios, silent failures requiring code path tracing, performance bottlenecks needing profiling analysis, architectural issues spanning multiple files. Critical: use in separate conversation from implementation for unbiased analysis. This diagnostic pattern prevents "cycle of self-deception" where agents defend their own implementation. Running in fresh context provides objective analysis without prior assumptions. Always provide complete error output—truncated logs prevent accurate diagnosis. Challenge agent explanations when they don't fit observed behavior: "You said X causes timeout, but logs show connection established. Explain this discrepancy with evidence." Reject guesses without citations: "Show me the file and line number where this occurs."

Prerequisites: Code research capabilities (deep codebase exploration via multi-hop semantic search, query expansion, and iterative follow-ups), file system access for reading implementation and configuration, complete error messages/stack traces/logs (not truncated output), optionally: file paths or function names if known. Verify all cited file paths and line numbers—agents can hallucinate locations. Use engineering judgment to validate reasoning—LLMs complete patterns, not logic. Adapt pattern for other diagnostics: performance issues (metrics/thresholds/profiling data), security vulnerabilities (attack vectors/boundaries/configuration gaps), deployment failures (infrastructure logs/expected vs actual state), integration issues (API contracts/data flow/boundary errors).

Lesson 4: Prompting 101 - Chain-of-Thought, constraints, structured reasoning
Lesson 5: Grounding - Grounding directives, RAG, forcing retrieval
Lesson 7: Planning & Execution - Evidence requirements, challenging agent logic
Lesson 10: Debugging - Closed-loop workflow, reproduction scripts, evidence-based approach