Files

5.3 KiB

AI Self-Improvement Digest - Reference Guide

Example Digest Entries

Example 1: Harness Engineering

Building Effective Agent Harnesses — Anthropic Engineering What: Anthropic's guide on structuring system prompts for reliable agent behavior, including the "think-act-observe" loop pattern. Why it matters for self-improvement: Shows how to design harnesses that make agents more predictable and debuggable when they fail. Takeaway: Add explicit "pause and verify" checkpoints before high-stakes actions like spawning sub-agents or making external calls. Relevance:

Example 2: Tool Development

MCP: The USB-C for AI Applications — Geoff Huntley What: Deep dive into Model Context Protocol as a standard for tool integration, with patterns for building composable skills. Why it matters for self-improvement: MCP skills are more portable and composable than ad-hoc integrations. Takeaway: When building new skills, follow MCP patterns for resource exposure and tool definition. Relevance:

Example 3: Self-Evaluation

Evaluating Language Model Agents — Lilian Weng What: Comprehensive framework for agent evaluation including trajectory analysis, tool use accuracy, and failure mode categorization. Why it matters for self-improvement: Without evals, you can't know if changes actually improve performance. Takeaway: Set up a simple regression test: save 5-10 representative tasks and re-run after skill updates. Relevance:

Example 4: Multi-Agent Coordination

Patterns for Multi-Agent Systems — Simon Willison What: Practical patterns for agent spawning, result aggregation, and error handling in distributed agent workflows. Why it matters for self-improvement: Shows when to spawn vs when to handle inline, and how to merge parallel results. Takeaway: Spawn sub-agents for tasks that need isolation; keep inline for context-dependent reasoning. Relevance:

Example 5: Memory Management

Context Compaction Strategies — arXiv What: Techniques for managing long conversations including summarization, key-value extraction, and selective retention. Why it matters for self-improvement: Long contexts degrade performance; smart compaction preserves what matters. Takeaway: Before compaction, extract and save key facts to MEMORY.md; summarize the rest. Relevance:

Search Queries by Category

Use these queries with kimi_search to find relevant content:

Harness & System Prompts

  • "system prompt engineering agent reliability"
  • "agent harness design patterns"
  • "prompt chaining best practices"
  • "few-shot prompting agents"

Skill & Tool Development

  • "MCP server patterns"
  • "AI agent tool integration"
  • "skill development framework"
  • "agent capabilities extension"

Self-Evaluation

  • "agent evaluation metrics"
  • "LLM agent testing"
  • "agent failure analysis"
  • "trajectory evaluation"

Multi-Agent Coordination

  • "multi-agent orchestration"
  • "agent spawning patterns"
  • "distributed agent systems"
  • "agent result aggregation"

Memory & Context

  • "context window management"
  • "long conversation memory"
  • "RAG for agents"
  • "conversation summarization"

Workflow Automation

  • "agent task decomposition"
  • "agent error handling"
  • "retry patterns agents"
  • "agent workflow design"

Quality Indicators

High-signal content (include):

  • Specific techniques with code examples
  • Lessons from production systems
  • Failure modes and how to avoid them
  • Comparative analysis of approaches
  • Author has built real agent systems

Low-signal content (exclude):

  • Pure announcements without technique
  • Marketing content
  • General AI hype
  • Ethics debates without implementation angle
  • Surface-level listicles

Setup Review Examples

Good Example (specific, grounded, affirmative)

🔧 Setup Review Based on today's findings:

  • Let's add a memory/experiments.md file to track harness experiments, since the Anthropic article showed experiment logging improves iteration speed
  • Let's update the channel-monitor cron to include a self-check step before responding, based on the "pause and verify" pattern from Simon Willison's post

No changes needed for multi-agent coordination — our current sub-agent spawning pattern already follows the isolation principle discussed.

Bad Example (vague, passive)

🔧 Setup Review Could consider maybe looking into some of the patterns mentioned. Might be worth exploring memory improvements at some point.

Weekly Review Template

At end of week, review memory/ai-digest-posted.json and answer:

  1. Experiments tried: What did we test this week?
  2. Outcomes: What worked? What didn't?
  3. Skills evaluated: Any new skills worth adopting?
  4. Setup changes made: What did we change based on learnings?
  5. Source quality: Which sources provided the most value?
  6. Adjustments: Should we add/remove sources? Change frequency?

Common Pitfalls to Avoid

  1. Including general news - Stay focused on self-improvement, not announcements
  2. Vague setup reviews - Be specific about what to change and why
  3. Skipping deduplication - Always check posted.json first
  4. No experiment suggestion - Always include one actionable experiment
  5. Ignoring existing setup - Connect suggestions to current AGENTS.md, TOOLS.md, skills/