Memory for Multi-Agent Systems (Go)

The Multi-Agent Memory Challenge

As AI systems grow more sophisticated, single-agent architectures are giving way to multi-agent systems where specialized agents collaborate on complex tasks. A code review agent works alongside a documentation agent, while a testing agent validates their outputs. But here's the catch: they all need to share memory.

Without proper coordination, you'll face race conditions, conflicting updates, and corrupted state. Agent A reads a memory, Agent B modifies it, and Agent A writes back stale data—classic concurrency problems, now in AI form. This article shows you how to build robust shared memory for multi-agent systems in Go.

Shared Memory Architecture

The foundation of multi-agent memory is a centralized store with proper access control. Each agent should have an identity, and every memory operation should be traceable:

package multiagent

type Agent struct {
    ID       string
    Role     string   // "code-review", "docs", "testing"
    Scopes   []string // memory namespaces this agent can access
}

type SharedMemory struct {
    ID        string
    Namespace string
    Content   string
    Version   int64     // Optimistic locking
    UpdatedBy string    // Agent ID
    UpdatedAt time.Time
    Metadata  map[string]any
}

type MemoryStore interface {
    Get(ctx context.Context, id string) (*SharedMemory, error)
    Put(ctx context.Context, mem *SharedMemory, agentID string) error
    List(ctx context.Context, namespace string) ([]SharedMemory, error)
    Lock(ctx context.Context, id string, agentID string, ttl time.Duration) error
    Unlock(ctx context.Context, id string, agentID string) error
}

Conflict Resolution Strategies

When multiple agents modify the same memory, you need a clear conflict resolution strategy. Here are the three main approaches:

Last-Write-Wins (LWW) — Simple but can lose data. Use for non-critical, frequently updated memories
Optimistic Locking — Version-based. Reject writes if version changed. Recommended for most use cases
Pessimistic Locking — Explicit locks before modification. Use for critical, long-running operations

Implementing Optimistic Locking

Optimistic locking is the sweet spot for most multi-agent systems—low overhead, no deadlocks, and clear conflict detection:

var ErrConflict = errors.New("memory version conflict")

func (s *Store) Put(ctx context.Context, mem *SharedMemory, agentID string) error {
    s.mu.Lock()
    defer s.mu.Unlock()
    
    existing, exists := s.memories[mem.ID]
    if exists && existing.Version != mem.Version {
        return fmt.Errorf("%w: expected version %d, got %d", 
            ErrConflict, mem.Version, existing.Version)
    }
    
    // Update metadata
    mem.Version++
    mem.UpdatedBy = agentID
    mem.UpdatedAt = time.Now()
    
    s.memories[mem.ID] = mem
    return nil
}

// Agent-side retry logic
func (a *Agent) UpdateMemoryWithRetry(ctx context.Context, id string, updateFn func(*SharedMemory)) error {
    for retries := 0; retries < 3; retries++ {
        mem, err := a.store.Get(ctx, id)
        if err != nil {
            return err
        }
        
        updateFn(mem) // Apply the agent's changes
        
        err = a.store.Put(ctx, mem, a.ID)
        if errors.Is(err, ErrConflict) {
            time.Sleep(time.Duration(retries*100) * time.Millisecond)
            continue // Retry with fresh data
        }
        return err
    }
    return fmt.Errorf("failed after 3 retries: conflict persists")
}

Namespace Isolation

Not every agent should access every memory. Use namespaces to isolate concerns and reduce conflict surface:

type NamespacePolicy struct {
    Namespace string
    Readers   []string // Agent roles allowed to read
    Writers   []string // Agent roles allowed to write
}

var defaultPolicies = []NamespacePolicy{
    {Namespace: "project:config",    Readers: []string{"*"},         Writers: []string{"admin"}},
    {Namespace: "code:review",       Readers: []string{"*"},         Writers: []string{"code-review", "lead"}},
    {Namespace: "docs:generated",    Readers: []string{"*"},         Writers: []string{"docs"}},
    {Namespace: "test:results",      Readers: []string{"*"},         Writers: []string{"testing"}},
    {Namespace: "agent:private:*",   Readers: []string{"self"},      Writers: []string{"self"}},
}

func (s *Store) CheckAccess(agent Agent, namespace string, write bool) bool {
    for _, policy := range s.policies {
        if matchNamespace(policy.Namespace, namespace) {
            allowed := policy.Readers
            if write {
                allowed = policy.Writers
            }
            return containsRole(allowed, agent.Role) || contains(allowed, "*")
        }
    }
    return false
}

Event-Driven Coordination

For complex workflows, agents need to react to each other's memory updates. Implement a pub/sub layer on top of your memory store:

type MemoryEvent struct {
    Type      string // "created", "updated", "deleted"
    MemoryID  string
    Namespace string
    AgentID   string
    Timestamp time.Time
}

type EventBus struct {
    subscribers map[string][]chan MemoryEvent
    mu          sync.RWMutex
}

func (b *EventBus) Subscribe(namespace string) <-chan MemoryEvent {
    b.mu.Lock()
    defer b.mu.Unlock()
    
    ch := make(chan MemoryEvent, 100)
    b.subscribers[namespace] = append(b.subscribers[namespace], ch)
    return ch
}

// In your agent
func (a *DocsAgent) Run(ctx context.Context) {
    events := a.eventBus.Subscribe("code:review")
    
    for {
        select {
        case <-ctx.Done():
            return
        case evt := <-events:
            if evt.Type == "created" {
                // New code review memory - generate docs
                a.handleNewReview(ctx, evt.MemoryID)
            }
        }
    }
}

Preventing Deadlocks

If you use pessimistic locks, always follow these rules to prevent deadlocks:

Lock ordering — Always acquire locks in a consistent order (e.g., alphabetically by memory ID)
Timeouts — Every lock must have a TTL. Stale locks from crashed agents will auto-expire
Single-lock preference — Design operations to require only one lock when possible
Heartbeats — For long operations, extend the lock TTL periodically

func (a *Agent) WithLock(ctx context.Context, id string, fn func() error) error {
    // Acquire lock with 30s TTL
    if err := a.store.Lock(ctx, id, a.ID, 30*time.Second); err != nil {
        return fmt.Errorf("failed to acquire lock: %w", err)
    }
    defer a.store.Unlock(ctx, id, a.ID)
    
    // Start heartbeat for long operations
    done := make(chan struct{})
    go func() {
        ticker := time.NewTicker(10 * time.Second)
        defer ticker.Stop()
        for {
            select {
            case <-done:
                return
            case <-ticker.C:
                a.store.Lock(ctx, id, a.ID, 30*time.Second) // Extend
            }
        }
    }()
    
    err := fn()
    close(done)
    return err
}

Key Takeaways

Building reliable shared memory for multi-agent systems requires:

Central, versioned storage — Every memory has an owner and version
Optimistic locking by default — Detect conflicts, retry with fresh data
Namespace isolation — Limit access to reduce conflict surface
Event-driven coordination — Agents react to changes, not poll
Defensive locking — Timeouts, ordering, and heartbeats prevent deadlocks

Power Your Multi-Agent Systems with CodeMem

CodeMem provides built-in support for multi-agent coordination—versioned memories, namespace isolation, and conflict resolution out of the box. Stop reinventing infrastructure and focus on your agent logic. Start free today.

Get Started Free →