# AI Self-Improvement Digest - Reference Guide ## Example Digest Entries ### Example 1: Harness Engineering **Building Effective Agent Harnesses** — Anthropic Engineering What: Anthropic's guide on structuring system prompts for reliable agent behavior, including the "think-act-observe" loop pattern. Why it matters for self-improvement: Shows how to design harnesses that make agents more predictable and debuggable when they fail. Takeaway: Add explicit "pause and verify" checkpoints before high-stakes actions like spawning sub-agents or making external calls. Relevance: ⭐⭐⭐⭐⭐ ### Example 2: Tool Development **MCP: The USB-C for AI Applications** — Geoff Huntley What: Deep dive into Model Context Protocol as a standard for tool integration, with patterns for building composable skills. Why it matters for self-improvement: MCP skills are more portable and composable than ad-hoc integrations. Takeaway: When building new skills, follow MCP patterns for resource exposure and tool definition. Relevance: ⭐⭐⭐⭐ ### Example 3: Self-Evaluation **Evaluating Language Model Agents** — Lilian Weng What: Comprehensive framework for agent evaluation including trajectory analysis, tool use accuracy, and failure mode categorization. Why it matters for self-improvement: Without evals, you can't know if changes actually improve performance. Takeaway: Set up a simple regression test: save 5-10 representative tasks and re-run after skill updates. Relevance: ⭐⭐⭐⭐⭐ ### Example 4: Multi-Agent Coordination **Patterns for Multi-Agent Systems** — Simon Willison What: Practical patterns for agent spawning, result aggregation, and error handling in distributed agent workflows. Why it matters for self-improvement: Shows when to spawn vs when to handle inline, and how to merge parallel results. Takeaway: Spawn sub-agents for tasks that need isolation; keep inline for context-dependent reasoning. Relevance: ⭐⭐⭐⭐ ### Example 5: Memory Management **Context Compaction Strategies** — arXiv What: Techniques for managing long conversations including summarization, key-value extraction, and selective retention. Why it matters for self-improvement: Long contexts degrade performance; smart compaction preserves what matters. Takeaway: Before compaction, extract and save key facts to MEMORY.md; summarize the rest. Relevance: ⭐⭐⭐⭐ ## Search Queries by Category Use these queries with `kimi_search` to find relevant content: ### Harness & System Prompts - "system prompt engineering agent reliability" - "agent harness design patterns" - "prompt chaining best practices" - "few-shot prompting agents" ### Skill & Tool Development - "MCP server patterns" - "AI agent tool integration" - "skill development framework" - "agent capabilities extension" ### Self-Evaluation - "agent evaluation metrics" - "LLM agent testing" - "agent failure analysis" - "trajectory evaluation" ### Multi-Agent Coordination - "multi-agent orchestration" - "agent spawning patterns" - "distributed agent systems" - "agent result aggregation" ### Memory & Context - "context window management" - "long conversation memory" - "RAG for agents" - "conversation summarization" ### Workflow Automation - "agent task decomposition" - "agent error handling" - "retry patterns agents" - "agent workflow design" ## Quality Indicators **High-signal content (include):** - Specific techniques with code examples - Lessons from production systems - Failure modes and how to avoid them - Comparative analysis of approaches - Author has built real agent systems **Low-signal content (exclude):** - Pure announcements without technique - Marketing content - General AI hype - Ethics debates without implementation angle - Surface-level listicles ## Setup Review Examples ### Good Example (specific, grounded, affirmative) 🔧 Setup Review Based on today's findings: - Let's add a `memory/experiments.md` file to track harness experiments, since the Anthropic article showed experiment logging improves iteration speed - Let's update the channel-monitor cron to include a self-check step before responding, based on the "pause and verify" pattern from Simon Willison's post No changes needed for multi-agent coordination — our current sub-agent spawning pattern already follows the isolation principle discussed. ### Bad Example (vague, passive) 🔧 Setup Review Could consider maybe looking into some of the patterns mentioned. Might be worth exploring memory improvements at some point. ## Weekly Review Template At end of week, review `memory/ai-digest-posted.json` and answer: 1. **Experiments tried:** What did we test this week? 2. **Outcomes:** What worked? What didn't? 3. **Skills evaluated:** Any new skills worth adopting? 4. **Setup changes made:** What did we change based on learnings? 5. **Source quality:** Which sources provided the most value? 6. **Adjustments:** Should we add/remove sources? Change frequency? ## Common Pitfalls to Avoid 1. **Including general news** - Stay focused on self-improvement, not announcements 2. **Vague setup reviews** - Be specific about what to change and why 3. **Skipping deduplication** - Always check posted.json first 4. **No experiment suggestion** - Always include one actionable experiment 5. **Ignoring existing setup** - Connect suggestions to current AGENTS.md, TOOLS.md, skills/