go scale large-repos performance

Large Repos + Memory-Aware Agents (Go)

Learn strategies for scaling AI agent memory to handle massive codebases with millions of lines. Chunking, hierarchical indexing, and selective retrieval in Go.

CodeMem Team

The Large Codebase Challenge

Your AI coding agent works beautifully on small projects. It remembers context, understands patterns, and provides relevant suggestions. Then you point it at a monorepo with 2 million lines of code across 15,000 files—and everything falls apart.

The problem isn't the AI model. It's the memory architecture. Most memory systems weren't designed for repositories where indexing everything would consume gigabytes of vector storage and retrieval latency would kill the developer experience. This guide shows you how to build memory-aware agents that scale gracefully to massive codebases using Go.

Understanding Memory at Scale

Before diving into code, let's understand the math. A typical code file generates 1-5 memory chunks after processing. For a repo with 15,000 files:

  • Conservative estimate: 45,000 memory vectors
  • Vector size: 1536 dimensions × 4 bytes = 6KB per vector
  • Raw storage: ~270MB just for vectors
  • With metadata: 500MB+ total memory footprint

That's manageable—but retrieval becomes the bottleneck. Searching 45,000 vectors for every keystroke? Your agent becomes unusable. The solution: hierarchical memory with selective loading.

Hierarchical Memory Architecture

Structure memory in layers, from broad to specific:

package memory

type HierarchicalStore struct {
    // L1: Repository-level summaries (always loaded)
    repoSummaries map[string]*RepoSummary
    
    // L2: Module/package summaries (loaded on demand)
    moduleSummaries *LRUCache[string, *ModuleSummary]
    
    // L3: File-level memories (loaded when file is active)
    fileMemories *LRUCache[string, []Memory]
    
    // L4: Symbol-level details (loaded for current context)
    symbolIndex *LRUCache[string, *SymbolMemory]
    
    maxL2Size int // Max modules in memory
    maxL3Size int // Max files in memory
}

func NewHierarchicalStore(config ScaleConfig) *HierarchicalStore {
    return &HierarchicalStore{
        repoSummaries:   make(map[string]*RepoSummary),
        moduleSummaries: NewLRUCache[string, *ModuleSummary](config.MaxModules),
        fileMemories:    NewLRUCache[string, []Memory](config.MaxFiles),
        symbolIndex:     NewLRUCache[string, *SymbolMemory](config.MaxSymbols),
        maxL2Size:       config.MaxModules,
        maxL3Size:       config.MaxFiles,
    }
}

Smart Chunking for Code

Generic text chunking destroys code semantics. Use AST-aware chunking that respects code boundaries:

type CodeChunker struct {
    maxChunkSize   int
    overlapLines   int
    preserveBlocks bool
}

func (c *CodeChunker) ChunkGoFile(path string, content []byte) ([]CodeChunk, error) {
    fset := token.NewFileSet()
    f, err := parser.ParseFile(fset, path, content, parser.ParseComments)
    if err != nil {
        // Fallback to line-based chunking for unparseable files
        return c.chunkByLines(content), nil
    }
    
    var chunks []CodeChunk
    
    // Create chunk per top-level declaration
    for _, decl := range f.Decls {
        switch d := decl.(type) {
        case *ast.FuncDecl:
            chunk := c.extractFuncChunk(fset, content, d)
            chunks = append(chunks, chunk)
            
        case *ast.GenDecl:
            // Group related type/const/var declarations
            if c.shouldChunkDecl(d) {
                chunk := c.extractDeclChunk(fset, content, d)
                chunks = append(chunks, chunk)
            }
        }
    }
    
    // Add package-level summary chunk
    summary := c.generateFileSummary(f, chunks)
    chunks = append([]CodeChunk{summary}, chunks...)
    
    return chunks, nil
}

type CodeChunk struct {
    ID         string
    FilePath   string
    ChunkType  string // "function", "type", "summary"
    SymbolName string
    Content    string
    StartLine  int
    EndLine    int
    Imports    []string
    References []string
}

Context-Aware Loading

Don't load everything. Load what matters for the current task:

type ContextLoader struct {
    store      *HierarchicalStore
    vectorDB   VectorStore
    depGraph   *DependencyGraph
}

func (l *ContextLoader) LoadForFile(ctx context.Context, filePath string) (*MemoryContext, error) {
    mc := &MemoryContext{
        ActiveFile: filePath,
    }
    
    // 1. Always include repo summary
    repoID := l.extractRepoID(filePath)
    mc.RepoSummary = l.store.repoSummaries[repoID]
    
    // 2. Load module summary for current package
    modulePath := l.extractModulePath(filePath)
    moduleSummary, err := l.loadModuleSummary(ctx, modulePath)
    if err == nil {
        mc.ModuleSummary = moduleSummary
    }
    
    // 3. Load file memories
    fileMemories, err := l.loadFileMemories(ctx, filePath)
    if err == nil {
        mc.FileMemories = fileMemories
    }
    
    // 4. Load direct dependencies (1 level deep)
    deps := l.depGraph.DirectDependencies(filePath)
    for _, dep := range deps[:min(len(deps), 5)] { // Limit to 5 deps
        depMemories, _ := l.loadFileMemories(ctx, dep)
        mc.DependencyMemories = append(mc.DependencyMemories, depMemories...)
    }
    
    return mc, nil
}

// LoadForQuery expands context based on semantic search
func (l *ContextLoader) LoadForQuery(ctx context.Context, query string, activeFile string) (*MemoryContext, error) {
    // Start with file context
    mc, err := l.LoadForFile(ctx, activeFile)
    if err != nil {
        return nil, err
    }
    
    // Semantic search across repo (with file-level pre-filtering)
    relevantFiles, err := l.vectorDB.SearchFileSummaries(ctx, query, 10)
    if err != nil {
        return mc, nil // Return what we have
    }
    
    // Load top 3 most relevant file memories
    for _, rf := range relevantFiles[:min(len(relevantFiles), 3)] {
        if rf.FilePath != activeFile {
            memories, _ := l.loadFileMemories(ctx, rf.FilePath)
            mc.RelevantMemories = append(mc.RelevantMemories, memories...)
        }
    }
    
    return mc, nil
}

Incremental Indexing

Re-indexing millions of lines on every change is impractical. Use incremental updates:

type IncrementalIndexer struct {
    store    *HierarchicalStore
    vectorDB VectorStore
    hasher   *ContentHasher
}

func (idx *IncrementalIndexer) UpdateChanged(ctx context.Context, changes []FileChange) error {
    var toReindex []string
    var toDelete []string
    
    for _, change := range changes {
        switch change.Type {
        case ChangeModified:
            // Check if content actually changed (ignore whitespace-only)
            oldHash := idx.hasher.GetHash(change.Path)
            newHash := idx.hasher.ComputeHash(change.NewContent)
            if oldHash != newHash {
                toReindex = append(toReindex, change.Path)
            }
            
        case ChangeAdded:
            toReindex = append(toReindex, change.Path)
            
        case ChangeDeleted:
            toDelete = append(toDelete, change.Path)
        }
    }
    
    // Batch delete old vectors
    if len(toDelete) > 0 {
        if err := idx.vectorDB.DeleteByFiles(ctx, toDelete); err != nil {
            return fmt.Errorf("delete failed: %w", err)
        }
    }
    
    // Batch reindex changed files
    if len(toReindex) > 0 {
        chunks := idx.chunkFiles(toReindex)
        if err := idx.vectorDB.UpsertBatch(ctx, chunks); err != nil {
            return fmt.Errorf("upsert failed: %w", err)
        }
    }
    
    // Update module summaries for affected packages
    affectedModules := idx.findAffectedModules(toReindex)
    for _, mod := range affectedModules {
        idx.regenerateModuleSummary(ctx, mod)
    }
    
    return nil
}

Performance Benchmarks

With these techniques, here's what you can expect:

  • Initial indexing: ~1 minute per 100K lines (parallelized)
  • Incremental update: <100ms for typical file changes
  • Context loading: <50ms for file + dependencies
  • Query retrieval: <200ms including semantic search
  • Memory footprint: ~50MB active (LRU caches), full index on disk

Key Takeaways

Scaling memory for large codebases requires architectural thinking:

  1. Hierarchical storage — Summaries at repo/module level, details at file/symbol level
  2. AST-aware chunking — Respect code boundaries, don't split functions
  3. Context-aware loading — Load based on current file, dependencies, and query
  4. Incremental indexing — Only update what changed
  5. LRU caching — Keep hot paths fast, evict stale context

Scale Your AI Agents with CodeMem

CodeMem's memory infrastructure is built for scale. Hierarchical indexing, incremental updates, and intelligent retrieval work out of the box—so you can focus on building features, not infrastructure. Handle repos of any size without sacrificing performance.

Start Building Free →