Test Failure Diagnosis

Systematic four-phase workflow for debugging test failures, combining Chain-of-Thought reasoning, grounding directives, and evidence requirements for objective analysis.

The Prompt

```
$FAILURE_DESCRIPTION
```

Use the code research to analyze the test failure above.

DIAGNOSE:

1. Examine the test code and its assertions.
2. Understand and clearly explain the intention and reasoning of the test - what is it testing?
3. Compare against the implementation code being tested
4. Identify the root cause of failure

DETERMINE:
Is this a test that needs updating or a real bug in the implementation?

Provide your conclusion with evidence.

Overview

Why systematic diagnosis prevents hallucination: Fenced code block preserves error formatting and prevents LLM from interpreting failure messages as instructions. "Use the code research" is explicit grounding directive—forces codebase search instead of hallucination from training patterns. DIAGNOSE numbered steps implement Chain-of-Thought, forcing sequential analysis (can't jump to "root cause" without examining test intent first). "Understand the intention" (step 2) ensures agent articulates WHY the test exists, not just WHAT it does—critical for distinguishing bugs from outdated requirements. DETERMINE binary decision constrains output to "bug vs outdated test" instead of open-ended conclusions. "Provide evidence" requires file paths and line numbers—concrete proof, not vague assertions.

When to use—fresh context requirement: Test failures during refactoring (determine if tests need updating or code has bugs), CI/CD pipeline failures (systematic root cause analysis), after implementing new features (analyze failures in existing suites). Critical: use in separate conversation from implementation for unbiased analysis. This diagnostic pattern prevents "cycle of self-deception" where agents defend their own implementation. Running in fresh context provides objective analysis without prior assumptions. Include full stack traces and error messages—truncated output prevents accurate diagnosis. Without grounding directive, agents hallucinate based on training patterns instead of reading your actual codebase.

Prerequisites: Code research capabilities (deep codebase exploration via multi-hop semantic search, query expansion, and iterative follow-ups), access to both test files and implementation code, test failure output (stack traces, assertion errors, logs), file paths to relevant files. Adapt pattern for other diagnostics: performance issues (metrics/thresholds/bottlenecks), security vulnerabilities (attack vectors/boundaries/gaps), deployment failures (logs/expected flow/configuration mismatches).

Lesson 3: High-Level Methodology - Four-phase workflow (Research > Plan > Execute > Validate)
Lesson 4: Prompting 101 - Chain-of-Thought, constraints, structured format
Lesson 7: Planning & Execution - Evidence requirements, grounding techniques
Lesson 8: Tests as Guardrails - Three-context workflow, test failure diagnosis