ChunkHound

Modern RAG for your codebase - Semantic and Regex Search via MCP

What it does

LLMs like Claude and GPT don’t know your codebase - they only know what they were trained on. Every time they help you code, they need to search your files to understand your project’s specific patterns and terminology.

ChunkHound integrates with AI assistants via the Model Context Protocol (MCP) to give them two ways to explore your code:

Semantic search - Finds code by meaning, so when the AI looks for “user authentication” it also finds your validateLogin() and checkCredentials() functions
Regex search - Pattern matching for precise code structures

Traditional search was built for humans who know what they’re looking for. But AI assistants start with zero knowledge about your codebase. Semantic search bridges this gap by understanding that “database timeout” and “SQL connection lost” are related concepts, even though they share no keywords.

Supported Languages

ChunkHound supports 22 languages with structured parsing:

Programming (via Tree-sitter): Python, JavaScript, TypeScript, JSX, TSX, Java, Kotlin, Groovy, C, C++, C#, Go, Rust, Bash, MATLAB, Makefile
Configuration (via Tree-sitter): JSON, YAML, TOML, Markdown
Text-based (custom parsers): Text files, PDF

Requirements

Python 3.10+
uv package manager
API key for semantic search (optional)

Installation

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install ChunkHound
uv tool install chunkhound

Configuration

ChunkHound works without configuration for regex search. For semantic search, create .chunkhound.json in your project root:

Recommended: Fastest, most accurate, and cost effective

{
  "embedding": {
    "provider": "voyageai",
    "api_key": "pa-your-voyage-key"
  }
}

Get API key from VoyageAI Console | Documentation

Best for: Wide compatibility and ecosystem

{
  "embedding": {
    "provider": "openai",
    "api_key": "sk-your-openai-key"
  }
}

Get API key from OpenAI Platform | Embeddings Guide

Best for: Privacy and offline use

# Start Ollama with embedding model
ollama pull nomic-embed-text

Ollama Documentation | API Reference

{
  "embedding": {
    "provider": "openai",
    "base_url": "http://localhost:11434/v1",
    "model": "nomic-embed-text"
  }
}

No API key required - completely local. Works with any OpenAI compatible endpoint.

IDE Setup

Configure ChunkHound as an MCP server in your AI assistant:

Add to ~/.claude.json:

{
  "mcpServers": {
    "chunkhound": {
      "command": "chunkhound",
      "args": ["mcp"]
    }
  }
}

Add to .vscode/mcp.json:

{
  "servers": {
    "chunkhound": {
      "type": "stdio",
      "command": "chunkhound",
      "args": ["mcp", "/path/to/project"]
    }
  }
}

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "chunkhound": {
      "command": "chunkhound",
      "args": ["mcp", "/path/to/project"]
    }
  }
}

Initial Indexing

# Index your codebase (respects .gitignore automatically)
cd /path/to/project && chunkhound index

Documentation

Tutorial - Learn ChunkHound in 5 minutes
Configuration - Complete configuration reference
Under the Hood - Technical deep dive into cAST algorithm and architecture