Memory-First Prompting Patterns (Python)
Practical patterns for building prompts that leverage persistent memory to create more contextual, consistent AI agents.
The Shift to Memory-First Prompting
Traditional prompting treats each request as isolated—you craft a prompt, get a response, and move on. But AI agents that remember context, preferences, and past decisions behave fundamentally differently. They need prompts designed to leverage that memory.
Memory-first prompting is a paradigm shift: instead of stuffing context into every prompt, you design prompts that query, inject, and update memory systematically. The result? Smaller prompts, better context, and agents that actually learn from interactions.
Pattern 1: Contextual Retrieval Injection
Before generating a response, retrieve relevant memories and inject them into the system prompt. This ensures the LLM has access to context without manual repetition.
from codemem import MemoryClient
client = MemoryClient()
async def build_prompt(user_message: str, project: str) -> str:
# Retrieve relevant memories based on the user's message
memories = await client.search(
query=user_message,
project=project,
limit=5
)
# Format memories for injection
memory_context = "\n".join([
f"- [{m.type}] {m.content}"
for m in memories
])
system_prompt = f"""You are a coding assistant with persistent memory.
## Relevant Context from Memory:
{memory_context}
Use this context to provide consistent, personalized responses.
If you make a new decision, state it clearly so it can be remembered."""
return system_prompt The key insight: retrieve memories semantically based on the user's current query, not a fixed set. This keeps prompts focused and relevant.
Pattern 2: Preference-Aware Instructions
User preferences should shape how the agent responds—not just what it responds with. This pattern queries preferences at prompt-build time and adjusts instructions accordingly.
async def get_style_instructions(user_id: str) -> str:
preferences = await client.list(
user_id=user_id,
type="preference",
tags=["coding_style"]
)
instructions = []
for pref in preferences:
if pref.metadata.get("key") == "language":
instructions.append(f"Prefer {pref.content} for examples")
elif pref.metadata.get("key") == "verbosity":
level = pref.content
if level == "concise":
instructions.append("Keep explanations brief and direct")
else:
instructions.append("Provide detailed explanations")
elif pref.metadata.get("key") == "indentation":
instructions.append(f"Use {pref.content} indentation in code")
return "\n".join(instructions) if instructions else "Use sensible defaults" Pattern 3: Decision Chain Prompting
Past decisions create constraints for future ones. This pattern retrieves related decisions and explicitly includes them as constraints, preventing contradictory choices.
async def get_architectural_constraints(project: str) -> str:
decisions = await client.list(
project=project,
type="decision",
tags=["architecture"]
)
if not decisions:
return "No prior architectural decisions. You may propose new ones."
constraints = []
for d in decisions:
constraints.append(
f"• {d.content} (Reason: {d.metadata.get('reasoning', 'N/A')})"
)
return f"""## Established Architectural Decisions
These are binding unless explicitly reconsidered:
{chr(10).join(constraints)}
If you need to deviate, explain why and propose updating the decision.""" This pattern is powerful for maintaining consistency across long-running projects where decisions compound over time.
Pattern 4: Memory-Triggered Actions
Some memories should trigger specific behaviors. This pattern uses memory types and tags to route the agent's behavior dynamically.
async def get_action_triggers(project: str) -> str:
# Check for active context that might affect behavior
active_contexts = await client.list(
project=project,
type="context",
metadata={"active": True}
)
triggers = []
for ctx in active_contexts:
if "debugging" in ctx.tags:
triggers.append("User is debugging. Prioritize diagnostic output.")
if "refactoring" in ctx.tags:
triggers.append("Refactoring in progress. Maintain existing tests.")
if "learning" in ctx.tags:
triggers.append("User is learning. Explain concepts thoroughly.")
return "\n".join(triggers) if triggers else "" Pattern 5: Post-Response Memory Extraction
The response itself often contains new information worth remembering. This pattern extracts facts, decisions, and preferences from the agent's output.
EXTRACTION_PROMPT = """Analyze the following response and extract any:
1. Decisions made (architectural, design, or implementation choices)
2. Facts stated (API endpoints, configs, technical specifications)
3. Preferences expressed by the user
Return as JSON:
{"decisions": [...], "facts": [...], "preferences": [...]}
If none found for a category, return empty array."""
async def extract_and_store(response: str, project: str):
extraction = await llm.generate(
f"{EXTRACTION_PROMPT}\n\nResponse:\n{response}"
)
data = json.loads(extraction)
for decision in data.get("decisions", []):
await client.add(
content=decision,
type="decision",
project=project
)
for fact in data.get("facts", []):
await client.add(
content=fact,
type="fact",
project=project
) Putting It All Together
A memory-first agent combines these patterns into a cohesive flow:
async def handle_message(user_message: str, user_id: str, project: str):
# 1. Build context-aware prompt
relevant_context = await build_prompt(user_message, project)
style_instructions = await get_style_instructions(user_id)
constraints = await get_architectural_constraints(project)
triggers = await get_action_triggers(project)
system_prompt = f"""{relevant_context}
{style_instructions}
{constraints}
{triggers}"""
# 2. Generate response
response = await llm.generate(
system=system_prompt,
user=user_message
)
# 3. Extract and store new memories
await extract_and_store(response, project)
return response Best Practices
- Limit injection size: Cap retrieved memories to avoid prompt bloat—5-10 relevant items is usually enough
- Prioritize recency: Recent memories are often more relevant; weight them accordingly
- Deduplicate: Semantic search can return similar memories; dedupe before injection
- Separate concerns: Keep preferences, decisions, and context in distinct prompt sections
- Make extraction explicit: Ask the LLM to flag new information worth remembering
Start Building Memory-First Agents
These patterns transform how your agents interact with users. Instead of stateless Q&A, you get assistants that learn, remember, and maintain consistency across sessions.
CodeMem provides the memory infrastructure these patterns need—semantic search, typed memories, project scoping, and automatic expiration. Focus on your prompting logic while we handle the persistence.