Lesson 10: Debugging with AI Agents

Debugging with AI agents isn't about asking "what's wrong with my code?" It's about systematically placing agents in diagnostic environments where they can observe, reproduce, and verify fixes with evidence.

Learning Objectives

Apply systematic debugging methodology with AI agents as diagnostic tools
Build reproducible environments that isolate bugs for agent inspection
Configure agents with direct access to logs, databases, and runtime state
Establish evidence-based verification before accepting proposed solutions

The Debugging Mindset: Scientific Method Over Guesswork

Traditional debugging with AI assistants often looks like this:

Engineer: "This endpoint is returning 500 errors. Fix it."
Agent: "The issue is likely in the database query. Try this..."
Engineer: *applies patch blindly*
Result: Bug persists, or new bug introduced

Production debugging requires the scientific method:

Observe - Collect data (logs, metrics, traces)
Hypothesize - Form testable theories about root cause
Reproduce - Isolate the bug in a controlled environment
Test - Verify the hypothesis with evidence
Fix - Apply solution with regression tests
Verify - Confirm fix resolves the issue without side effects

AI agents excel when you give them access to the full diagnostic environment, not just a description of symptoms.

Code Inspection: Teaching Agents to Read Runtime State

Before diving into reproduction, agents need to understand what's actually happening in your system.

Static Analysis First

Give agents the full execution path:

# Bad: "Debug this API endpoint"
# Good: Give context
"""
Debug the `/api/orders/:id` endpoint returning 500 errors.

Stack trace from production:
[paste stack trace]

Read these files to understand the flow:
- src/api/orders.ts (HTTP handler)
- src/services/OrderService.ts (business logic)
- src/db/OrderRepository.ts (data access)
- src/middleware/auth.ts (authentication)

Check for:
- Unhandled promise rejections
- Missing null checks
- Race conditions in async operations
"""

Dynamic Inspection with Debug Scripts

For production issues, create inspection scripts that agents can run:

// scripts/debug-order-500.ts
import { OrderService } from '../src/services/OrderService';
import { logger } from '../src/utils/logger';

async function inspectOrder(orderId: string) {
  logger.info('Starting order inspection', { orderId });

  // Check database state
  const order = await db.orders.findById(orderId);
  logger.info('Order record', { order });

  // Check related entities
  const items = await db.orderItems.findByOrderId(orderId);
  logger.info('Order items', { items, count: items.length });

  // Check user state
  const user = await db.users.findById(order.userId);
  logger.info('User state', {
    userId: user.id,
    status: user.status,
    permissions: user.permissions,
  });

  // Reproduce the operation
  try {
    const result = await OrderService.calculateTotal(orderId);
    logger.info('Calculate total succeeded', { result });
  } catch (error) {
    logger.error('Calculate total failed', { error, orderId });
    throw error;
  }
}

// Run with: npm run debug:order -- <order-id>
const orderId = process.argv[2];
inspectOrder(orderId).catch((err) => {
  logger.error('Inspection failed', { err });
  process.exit(1);
});

Agent workflow:

Engineer: "Run scripts/debug-order-500.ts with order ID abc123 and analyze the output."
Agent: *executes script, reads logs, identifies null reference in items array*
Agent: "The bug is in OrderService.calculateTotal() - it doesn't handle orders with zero items."
Engineer: "Show me the evidence in the logs."
Agent: *points to specific log line showing items.length === 0 causing undefined access*

Reproduction: Isolating Bugs in Controlled Environments

Reproducibility is everything. If you can't reproduce it, you can't verify the fix.

Docker-Based Reproduction

Create isolated environments that capture production state:

# Dockerfile.debug
FROM node:20-alpine

WORKDIR /app

# Install debugging tools
RUN apk add --no-cache curl jq postgresql-client redis

# Copy application code
COPY package*.json ./
RUN npm ci --only=production
COPY . .

# Add debug entrypoint
COPY scripts/debug-entrypoint.sh /debug-entrypoint.sh
RUN chmod +x /debug-entrypoint.sh

ENTRYPOINT ["/debug-entrypoint.sh"]

# scripts/debug-entrypoint.sh
#!/bin/sh
set -e

echo "=== Debug Environment Setup ==="
echo "Node version: $(node --version)"
echo "Environment: ${NODE_ENV}"
echo "Database: ${DATABASE_URL}"

# Verify connectivity
echo "Testing database connection..."
pg_isready -h ${DB_HOST} -p ${DB_PORT} || exit 1

echo "Testing Redis connection..."
redis-cli -h ${REDIS_HOST} ping || exit 1

echo "=== Starting application in debug mode ==="
exec node --inspect=0.0.0.0:9229 dist/server.js

# docker-compose.debug.yml
services:
  app:
    build:
      context: .
      dockerfile: Dockerfile.debug
    ports:
      - '3000:3000'
      - '9229:9229' # Node debugger
    environment:
      NODE_ENV: development
      DATABASE_URL: postgres://user:pass@postgres:5432/app_db
      REDIS_URL: redis://redis:6379
      LOG_LEVEL: debug
    volumes:
      - ./logs:/app/logs
      - ./scripts:/app/scripts
    depends_on:
      - postgres
      - redis

  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: app_db
    volumes:
      - ./db/dump.sql:/docker-entrypoint-initdb.d/dump.sql

  redis:
    image: redis:7-alpine

  pgadmin:
    image: dpage/pgadmin4
    ports:
      - '5050:80'
    environment:
      PGADMIN_DEFAULT_EMAIL: admin@example.com
      PGADMIN_DEFAULT_PASSWORD: admin

Agent workflow:

Engineer: "Reproduce the order calculation bug using docker-compose.debug.yml"

Agent workflow:
Reads docker-compose.debug.yml to understand environment
Runs: docker-compose -f docker-compose.debug.yml up -d
Waits for services to be healthy
Loads test data: docker-compose exec postgres psql -U user -d app_db -f /scripts/test-data.sql
Executes reproduction script: docker-compose exec app node scripts/debug-order-500.ts abc123
Collects logs: docker-compose logs app > logs/reproduction.log
Analyzes logs and identifies root cause

Reproduction Scripts with Snapshots

For complex state-dependent bugs, capture and restore exact state:

// scripts/snapshot-production-state.ts
import { db } from '../src/db';
import { writeFileSync } from 'fs';

async function captureState(orderId: string) {
  const snapshot = {
    timestamp: new Date().toISOString(),
    order: await db.orders.findById(orderId),
    items: await db.orderItems.findByOrderId(orderId),
    user: await db.users.findById(order.userId),
    inventory: await db.inventory.findByProductIds(
      items.map((i) => i.productId)
    ),
    config: await db.config.findActive(),
  };

  writeFileSync(
    `snapshots/${orderId}-${Date.now()}.json`,
    JSON.stringify(snapshot, null, 2)
  );

  return snapshot;
}

// scripts/restore-snapshot.ts
import { db } from '../src/db';
import { readFileSync } from 'fs';

async function restoreState(snapshotPath: string) {
  const snapshot = JSON.parse(readFileSync(snapshotPath, 'utf-8'));

  await db.transaction(async (tx) => {
    // Restore in dependency order
    await tx.users.upsert(snapshot.user);
    await tx.config.upsert(snapshot.config);
    await tx.orders.upsert(snapshot.order);
    await tx.orderItems.bulkUpsert(snapshot.items);
    await tx.inventory.bulkUpsert(snapshot.inventory);
  });

  console.log(`Restored state from ${snapshot.timestamp}`);
}

Agent prompt:

"Restore snapshot snapshots/abc123-1738123456.json in the debug environment
and reproduce the calculation bug. Provide evidence of the failure."

Agents Inspecting Logs and Databases

Don't make agents guess. Give them read access to actual system state.

Structured Log Analysis

Format logs for agent consumption:

// src/utils/logger.ts
import winston from 'winston';

export const logger = winston.createLogger({
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  transports: [
    new winston.transports.File({
      filename: 'logs/debug.log',
      level: 'debug',
    }),
    new winston.transports.File({
      filename: 'logs/error.log',
      level: 'error',
    }),
  ],
});

// Usage in code
logger.debug('OrderService.calculateTotal called', {
  orderId: order.id,
  itemCount: items.length,
  userTier: user.tier,
});

Agent analysis workflow:

# Give agent grep-friendly access
Engineer: "Analyze logs/debug.log for order abc123 failures"

Agent: *runs grep commands*
grep -A 10 '"orderId":"abc123"' logs/debug.log | grep '"level":"error"'

Agent: *identifies pattern*
"Found 15 error logs. Common pattern: items array is empty causing
undefined reference in calculateDiscount(). Evidence:
- Line 4521: items.length === 0
- Line 4523: items[0].price throws TypeError
- Stack trace points to src/services/OrderService.ts:87"

Database Inspection Helpers

Create read-only access for agents:

// scripts/db-inspector.ts
import { db } from '../src/db';

const inspectors = {
  async order(orderId: string) {
    const order = await db.orders.findById(orderId);
    const items = await db.orderItems.findByOrderId(orderId);
    const user = await db.users.findById(order.userId);

    return {
      order,
      items,
      user,
      diagnostics: {
        totalItems: items.length,
        totalValue: items.reduce(
          (sum, item) => sum + item.price * item.quantity,
          0
        ),
        userStatus: user.status,
        orderStatus: order.status,
      },
    };
  },

  async user(userId: string) {
    const user = await db.users.findById(userId);
    const orders = await db.orders.findByUserId(userId);
    const recentActivity = await db.activity.findByUserId(userId, {
      limit: 50,
    });

    return { user, orders, recentActivity };
  },
};

// CLI: npm run inspect -- order abc123
const [type, id] = process.argv.slice(2);
inspectors[type](id).then((data) => console.log(JSON.stringify(data, null, 2)));

Agent workflow:

Engineer: "Inspect order abc123 and user xyz789 in the debug environment"

Agent:
1. Runs: docker-compose exec app npm run inspect -- order abc123
2. Runs: docker-compose exec app npm run inspect -- user xyz789
3. Analyzes both outputs
4. Identifies: "User xyz789 has status='suspended' but order abc123
   was created after suspension. Bug is in order creation validation."

Remote Debugging with Helper Scripts

For production issues that can't be fully reproduced locally, create safe debugging interfaces.

Read-Only Production Queries

// scripts/prod-readonly-query.ts
import { createReadOnlyConnection } from '../src/db';

const ALLOWED_QUERIES = {
  orderDetails: `
    SELECT o.*, json_agg(oi.*) as items
    FROM orders o
    LEFT JOIN order_items oi ON oi.order_id = o.id
    WHERE o.id = $1
    GROUP BY o.id
  `,
  userOrders: `
    SELECT * FROM orders
    WHERE user_id = $1
    ORDER BY created_at DESC
    LIMIT 50
  `,
};

async function query(queryName: keyof typeof ALLOWED_QUERIES, params: any[]) {
  const db = createReadOnlyConnection(process.env.PROD_REPLICA_URL);

  try {
    const result = await db.query(ALLOWED_QUERIES[queryName], params);
    return result.rows;
  } finally {
    await db.end();
  }
}

// Usage: npm run prod:query -- orderDetails abc123
const [queryName, ...params] = process.argv.slice(2);
query(queryName as any, params).then((data) =>
  console.log(JSON.stringify(data, null, 2))
);

Production Log Streaming

# scripts/stream-prod-logs.sh
#!/bin/bash
set -e

NAMESPACE=${1:-production}
POD_FILTER=${2:-app}

echo "Streaming logs from ${NAMESPACE}/${POD_FILTER}..."

kubectl logs -f -n ${NAMESPACE} \
  -l app=${POD_FILTER} \
  --tail=100 \
  --timestamps \
  | jq -r 'select(.level == "error" or .level == "warn") | "\(.timestamp) [\(.level | ascii_upcase)] \(.message) \(.context | tostring)"'

Agent workflow:

Engineer: "Stream production logs and identify patterns in order failures"

Agent:
1. Runs: bash scripts/stream-prod-logs.sh production order-service > logs/prod-stream.log
2. Waits 60 seconds to collect sample
3. Analyzes logs: grep -E "OrderService|calculateTotal" logs/prod-stream.log
4. Identifies: "Pattern found: 23 failures in the last minute, all for
   orders with promo codes. Database query timeout after 30s."

Requiring Evidence Before Accepting Solutions

Never accept "this should fix it" without proof.

Evidence-Based Verification Workflow

Step 1: Agent proposes a fix
Step 2: Engineer demands evidence

Engineer: "Show me proof this fix works. I need:
1. Reproduction showing the bug BEFORE your fix
2. Test output showing the bug is fixed AFTER your change
3. Regression tests proving no side effects"

Step 3: Agent provides evidence

Agent:
1. Runs reproduction: npm run debug:order -- abc123
   Output: "TypeError: Cannot read property 'price' of undefined"

2. Applies fix to OrderService.calculateTotal():
   ```diff
   -  const firstItem = items[0];
   +  if (items.length === 0) {
   +    return { total: 0, discount: 0 };
   +  }
   +  const firstItem = items[0];

Reruns reproduction: npm run debug:order -- abc123 Output: "Order total: $0 (no items)"
Runs tests: npm test -- OrderService Output: "12/12 tests passed"
Runs integration tests: npm run test:integration Output: "45/45 tests passed"

Step 4: Engineer reviews and approves

### Automated Verification Scripts

Build verification into your debugging workflow:

```typescript
// scripts/verify-fix.ts
import { execSync } from 'child_process';
import { logger } from '../src/utils/logger';

interface VerificationStep {
  name: string;
  command: string;
  expectedOutput?: RegExp;
  mustFail?: boolean;
}

const VERIFICATION_STEPS: VerificationStep[] = [
  {
    name: 'Reproduce bug (should fail)',
    command: 'npm run debug:order -- abc123',
    mustFail: true,
  },
  {
    name: 'Run unit tests',
    command: 'npm test -- OrderService',
    expectedOutput: /\d+ passed/,
  },
  {
    name: 'Run integration tests',
    command: 'npm run test:integration',
    expectedOutput: /\d+ passed/,
  },
  {
    name: 'Verify fix (should succeed)',
    command: 'npm run debug:order -- abc123',
    expectedOutput: /Order total: \$\d+/,
  },
];

async function verify() {
  for (const step of VERIFICATION_STEPS) {
    logger.info(`Running: ${step.name}`);

    try {
      const output = execSync(step.command, { encoding: 'utf-8' });

      if (step.mustFail) {
        logger.error(`FAILED: ${step.name} should have failed but succeeded`);
        return false;
      }

      if (step.expectedOutput && !step.expectedOutput.test(output)) {
        logger.error(`FAILED: ${step.name} output doesn't match expected pattern`);
        return false;
      }

      logger.info(`PASSED: ${step.name}`);
    } catch (error) {
      if (!step.mustFail) {
        logger.error(`FAILED: ${step.name}`, { error });
        return false;
      }
      logger.info(`PASSED: ${step.name} (expected failure)`);
    }
  }

  logger.info('All verification steps passed');
  return true;
}

verify().then((success) => process.exit(success ? 0 : 1));

Agent workflow:

Engineer: "Fix the order calculation bug and run scripts/verify-fix.ts to prove it works"

Agent:
1. Proposes fix
2. Applies fix
3. Runs: npm run verify:fix
4. Reports: "All 4 verification steps passed. Evidence: [logs output]"

Regression Test Requirements

Every fix must include a regression test:

// tests/bugs/order-empty-items-regression.test.ts
import { OrderService } from '../../src/services/OrderService';
import { createTestOrder } from '../fixtures/orders';

describe('Bug: Order with zero items causes TypeError', () => {
  it('should handle orders with no items gracefully', async () => {
    // Reproduce the exact scenario from production
    const order = await createTestOrder({
      id: 'test-order',
      userId: 'test-user',
      items: [], // Empty items array caused the bug
    });

    // This threw TypeError before the fix
    const result = await OrderService.calculateTotal(order.id);

    // After fix, should return zero total
    expect(result).toEqual({
      total: 0,
      discount: 0,
      tax: 0,
      grandTotal: 0,
    });
  });

  it('should handle null items array', async () => {
    const order = await createTestOrder({
      id: 'test-order-2',
      userId: 'test-user',
      items: null as any, // Edge case
    });

    const result = await OrderService.calculateTotal(order.id);
    expect(result.total).toBe(0);
  });
});

Engineer requirement:

"Before I accept this fix, show me:
The regression test you added
Proof the test fails on the broken code
Proof the test passes on the fixed code
Full test suite still passing"

Hands-On Exercise: Debug a Production Memory Leak

Scenario: Your Node.js API is experiencing memory leaks in production. Memory usage grows steadily until the process crashes with OOM errors after 6-8 hours of uptime.

Your Task:

Setup Reproduction Environment
- Create a Docker Compose setup that includes your API, PostgreSQL, and monitoring tools (Prometheus/Grafana)
- Add heap dump capture on OOM: node --max-old-space-size=512 --heapsnapshot-on-oom
- Load test script to simulate production traffic
Build Inspection Tools
- Script to capture heap snapshots on demand
- Script to analyze heap growth over time
- Database query to check for connection leaks

Agent-Driven Investigation

Prompt: "Investigate the memory leak in this application. Steps:
Start the reproduction environment
Run the load test for 30 minutes
Capture heap snapshots every 5 minutes
Analyze heap growth patterns
Identify the source of the leak
Propose a fix with evidence"

Verify the Fix
- Run 8-hour load test before and after
- Compare memory profiles
- Ensure fix doesn't impact performance

Expected Outcome:

Reproducible environment that demonstrates the leak
Agent identifies the root cause (e.g., event listener accumulation, unclosed DB connections)
Fix includes regression tests
Evidence showing stable memory usage after fix

Bonus Challenge:

Implement automated memory monitoring that alerts before OOM
Create a memory leak detection test in CI

Key Takeaways

Systematic debugging beats guesswork - Apply scientific method: observe, hypothesize, reproduce, test, verify
Reproducibility is everything - Use Docker, snapshots, and scripts to isolate bugs in controlled environments
Give agents full diagnostic access - Logs, databases, runtime state, not just code
Remote debugging requires safety - Read-only queries, controlled access, helper scripts
Demand evidence before accepting fixes - Reproduction before/after, tests passing, no regressions
Every bug needs a regression test - Prevent the same issue from recurring

Production debugging checklist:

Can you reproduce it in isolation?
Does the agent have access to all diagnostic data?
Is the fix verified with evidence?
Does a regression test prevent recurrence?
Are there monitoring gaps to address?

Debugging with AI agents is about building the right diagnostic environment and requiring evidence-based solutions. The agent is your systematic investigator - give it the tools and demand proof.

Next: Advanced Topics Module - Optimization and Performance

Learning Objectives​

The Debugging Mindset: Scientific Method Over Guesswork​

Code Inspection: Teaching Agents to Read Runtime State​

Static Analysis First​

Dynamic Inspection with Debug Scripts​

Reproduction: Isolating Bugs in Controlled Environments​

Docker-Based Reproduction​

Reproduction Scripts with Snapshots​

Agents Inspecting Logs and Databases​

Structured Log Analysis​

Database Inspection Helpers​

Remote Debugging with Helper Scripts​

Read-Only Production Queries​

Production Log Streaming​

Requiring Evidence Before Accepting Solutions​

Evidence-Based Verification Workflow​

Regression Test Requirements​

Hands-On Exercise: Debug a Production Memory Leak​

Key Takeaways​

Learning Objectives

The Debugging Mindset: Scientific Method Over Guesswork

Code Inspection: Teaching Agents to Read Runtime State

Static Analysis First

Dynamic Inspection with Debug Scripts

Reproduction: Isolating Bugs in Controlled Environments

Docker-Based Reproduction

Reproduction Scripts with Snapshots

Agents Inspecting Logs and Databases

Structured Log Analysis

Database Inspection Helpers

Remote Debugging with Helper Scripts

Read-Only Production Queries

Production Log Streaming

Requiring Evidence Before Accepting Solutions

Evidence-Based Verification Workflow

Regression Test Requirements

Hands-On Exercise: Debug a Production Memory Leak

Key Takeaways