ai-self-improvement-digest/references/examples.md

# AI Self-Improvement Digest - Reference Guide

## Example Digest Entries

### Example 1: Harness Engineering

**Building Effective Agent Harnesses** — Anthropic Engineering
What: Anthropic's guide on structuring system prompts for reliable agent behavior, including the "think-act-observe" loop pattern.
Why it matters for self-improvement: Shows how to design harnesses that make agents more predictable and debuggable when they fail.
Takeaway: Add explicit "pause and verify" checkpoints before high-stakes actions like spawning sub-agents or making external calls.
Relevance: ⭐⭐⭐⭐⭐

### Example 2: Tool Development

**MCP: The USB-C for AI Applications** — Geoff Huntley
What: Deep dive into Model Context Protocol as a standard for tool integration, with patterns for building composable skills.
Why it matters for self-improvement: MCP skills are more portable and composable than ad-hoc integrations.
Takeaway: When building new skills, follow MCP patterns for resource exposure and tool definition.
Relevance: ⭐⭐⭐⭐

### Example 3: Self-Evaluation

**Evaluating Language Model Agents** — Lilian Weng
What: Comprehensive framework for agent evaluation including trajectory analysis, tool use accuracy, and failure mode categorization.
Why it matters for self-improvement: Without evals, you can't know if changes actually improve performance.
Takeaway: Set up a simple regression test: save 5-10 representative tasks and re-run after skill updates.
Relevance: ⭐⭐⭐⭐⭐

### Example 4: Multi-Agent Coordination

**Patterns for Multi-Agent Systems** — Simon Willison
What: Practical patterns for agent spawning, result aggregation, and error handling in distributed agent workflows.
Why it matters for self-improvement: Shows when to spawn vs when to handle inline, and how to merge parallel results.
Takeaway: Spawn sub-agents for tasks that need isolation; keep inline for context-dependent reasoning.
Relevance: ⭐⭐⭐⭐

### Example 5: Memory Management

**Context Compaction Strategies** — arXiv
What: Techniques for managing long conversations including summarization, key-value extraction, and selective retention.
Why it matters for self-improvement: Long contexts degrade performance; smart compaction preserves what matters.
Takeaway: Before compaction, extract and save key facts to MEMORY.md; summarize the rest.
Relevance: ⭐⭐⭐⭐

## Search Queries by Category

Use these queries with `kimi_search` to find relevant content:

### Harness & System Prompts
- "system prompt engineering agent reliability"
- "agent harness design patterns"
- "prompt chaining best practices"
- "few-shot prompting agents"

### Skill & Tool Development
- "MCP server patterns"
- "AI agent tool integration"
- "skill development framework"
- "agent capabilities extension"

### Self-Evaluation
- "agent evaluation metrics"
- "LLM agent testing"
- "agent failure analysis"
- "trajectory evaluation"

### Multi-Agent Coordination
- "multi-agent orchestration"
- "agent spawning patterns"
- "distributed agent systems"
- "agent result aggregation"

### Memory & Context
- "context window management"
- "long conversation memory"
- "RAG for agents"
- "conversation summarization"

### Workflow Automation
- "agent task decomposition"
- "agent error handling"
- "retry patterns agents"
- "agent workflow design"

## Quality Indicators

**High-signal content (include):**
- Specific techniques with code examples
- Lessons from production systems
- Failure modes and how to avoid them
- Comparative analysis of approaches
- Author has built real agent systems

**Low-signal content (exclude):**
- Pure announcements without technique
- Marketing content
- General AI hype
- Ethics debates without implementation angle
- Surface-level listicles

## Setup Review Examples

### Good Example (specific, grounded, affirmative)

🔧 Setup Review
Based on today's findings:
- Let's add a `memory/experiments.md` file to track harness experiments, since the Anthropic article showed experiment logging improves iteration speed
- Let's update the channel-monitor cron to include a self-check step before responding, based on the "pause and verify" pattern from Simon Willison's post

No changes needed for multi-agent coordination — our current sub-agent spawning pattern already follows the isolation principle discussed.

### Bad Example (vague, passive)

🔧 Setup Review
Could consider maybe looking into some of the patterns mentioned. Might be worth exploring memory improvements at some point.

## Weekly Review Template

At end of week, review `memory/ai-digest-posted.json` and answer:

1. **Experiments tried:** What did we test this week?
2. **Outcomes:** What worked? What didn't?
3. **Skills evaluated:** Any new skills worth adopting?
4. **Setup changes made:** What did we change based on learnings?
5. **Source quality:** Which sources provided the most value?
6. **Adjustments:** Should we add/remove sources? Change frequency?

## Common Pitfalls to Avoid

1. **Including general news** - Stay focused on self-improvement, not announcements
2. **Vague setup reviews** - Be specific about what to change and why
3. **Skipping deduplication** - Always check posted.json first
4. **No experiment suggestion** - Always include one actionable experiment
5. **Ignoring existing setup** - Connect suggestions to current AGENTS.md, TOOLS.md, skills/