Memory-First Prompting Patterns (Python)

The Shift to Memory-First Prompting

Traditional prompting treats each request as isolated—you craft a prompt, get a response, and move on. But AI agents that remember context, preferences, and past decisions behave fundamentally differently. They need prompts designed to leverage that memory.

Memory-first prompting is a paradigm shift: instead of stuffing context into every prompt, you design prompts that query, inject, and update memory systematically. The result? Smaller prompts, better context, and agents that actually learn from interactions.

Pattern 1: Contextual Retrieval Injection

Before generating a response, retrieve relevant memories and inject them into the system prompt. This ensures the LLM has access to context without manual repetition.

from codemem import MemoryClient

client = MemoryClient()

async def build_prompt(user_message: str, project: str) -> str:
    # Retrieve relevant memories based on the user's message
    memories = await client.search(
        query=user_message,
        project=project,
        limit=5
    )
    
    # Format memories for injection
    memory_context = "\n".join([
        f"- [{m.type}] {m.content}" 
        for m in memories
    ])
    
    system_prompt = f"""You are a coding assistant with persistent memory.

## Relevant Context from Memory:
{memory_context}

Use this context to provide consistent, personalized responses.
If you make a new decision, state it clearly so it can be remembered."""
    
    return system_prompt

The key insight: retrieve memories semantically based on the user's current query, not a fixed set. This keeps prompts focused and relevant.

Pattern 2: Preference-Aware Instructions

User preferences should shape how the agent responds—not just what it responds with. This pattern queries preferences at prompt-build time and adjusts instructions accordingly.

async def get_style_instructions(user_id: str) -> str:
    preferences = await client.list(
        user_id=user_id,
        type="preference",
        tags=["coding_style"]
    )
    
    instructions = []
    for pref in preferences:
        if pref.metadata.get("key") == "language":
            instructions.append(f"Prefer {pref.content} for examples")
        elif pref.metadata.get("key") == "verbosity":
            level = pref.content
            if level == "concise":
                instructions.append("Keep explanations brief and direct")
            else:
                instructions.append("Provide detailed explanations")
        elif pref.metadata.get("key") == "indentation":
            instructions.append(f"Use {pref.content} indentation in code")
    
    return "\n".join(instructions) if instructions else "Use sensible defaults"

Pattern 3: Decision Chain Prompting

Past decisions create constraints for future ones. This pattern retrieves related decisions and explicitly includes them as constraints, preventing contradictory choices.

async def get_architectural_constraints(project: str) -> str:
    decisions = await client.list(
        project=project,
        type="decision",
        tags=["architecture"]
    )
    
    if not decisions:
        return "No prior architectural decisions. You may propose new ones."
    
    constraints = []
    for d in decisions:
        constraints.append(
            f"• {d.content} (Reason: {d.metadata.get('reasoning', 'N/A')})"
        )
    
    return f"""## Established Architectural Decisions
These are binding unless explicitly reconsidered:
{chr(10).join(constraints)}

If you need to deviate, explain why and propose updating the decision."""

This pattern is powerful for maintaining consistency across long-running projects where decisions compound over time.

Pattern 4: Memory-Triggered Actions

Some memories should trigger specific behaviors. This pattern uses memory types and tags to route the agent's behavior dynamically.

async def get_action_triggers(project: str) -> str:
    # Check for active context that might affect behavior
    active_contexts = await client.list(
        project=project,
        type="context",
        metadata={"active": True}
    )
    
    triggers = []
    for ctx in active_contexts:
        if "debugging" in ctx.tags:
            triggers.append("User is debugging. Prioritize diagnostic output.")
        if "refactoring" in ctx.tags:
            triggers.append("Refactoring in progress. Maintain existing tests.")
        if "learning" in ctx.tags:
            triggers.append("User is learning. Explain concepts thoroughly.")
    
    return "\n".join(triggers) if triggers else ""

Pattern 5: Post-Response Memory Extraction

The response itself often contains new information worth remembering. This pattern extracts facts, decisions, and preferences from the agent's output.

EXTRACTION_PROMPT = """Analyze the following response and extract any:
1. Decisions made (architectural, design, or implementation choices)
2. Facts stated (API endpoints, configs, technical specifications)
3. Preferences expressed by the user

Return as JSON:
{"decisions": [...], "facts": [...], "preferences": [...]}
If none found for a category, return empty array."""

async def extract_and_store(response: str, project: str):
    extraction = await llm.generate(
        f"{EXTRACTION_PROMPT}\n\nResponse:\n{response}"
    )
    
    data = json.loads(extraction)
    
    for decision in data.get("decisions", []):
        await client.add(
            content=decision,
            type="decision",
            project=project
        )
    
    for fact in data.get("facts", []):
        await client.add(
            content=fact,
            type="fact",
            project=project
        )

Putting It All Together

A memory-first agent combines these patterns into a cohesive flow:

async def handle_message(user_message: str, user_id: str, project: str):
    # 1. Build context-aware prompt
    relevant_context = await build_prompt(user_message, project)
    style_instructions = await get_style_instructions(user_id)
    constraints = await get_architectural_constraints(project)
    triggers = await get_action_triggers(project)
    
    system_prompt = f"""{relevant_context}

{style_instructions}

{constraints}

{triggers}"""
    
    # 2. Generate response
    response = await llm.generate(
        system=system_prompt,
        user=user_message
    )
    
    # 3. Extract and store new memories
    await extract_and_store(response, project)
    
    return response

Best Practices

Limit injection size: Cap retrieved memories to avoid prompt bloat—5-10 relevant items is usually enough
Prioritize recency: Recent memories are often more relevant; weight them accordingly
Deduplicate: Semantic search can return similar memories; dedupe before injection
Separate concerns: Keep preferences, decisions, and context in distinct prompt sections
Make extraction explicit: Ask the LLM to flag new information worth remembering

Start Building Memory-First Agents

These patterns transform how your agents interact with users. Instead of stateless Q&A, you get assistants that learn, remember, and maintain consistency across sessions.

CodeMem provides the memory infrastructure these patterns need—semantic search, typed memories, project scoping, and automatic expiration. Focus on your prompting logic while we handle the persistence.

Try CodeMem free and build your first memory-first agent →