Files

135 lines
5.3 KiB
Markdown

# AI Self-Improvement Digest - Reference Guide
## Example Digest Entries
### Example 1: Harness Engineering
**Building Effective Agent Harnesses** — Anthropic Engineering
What: Anthropic's guide on structuring system prompts for reliable agent behavior, including the "think-act-observe" loop pattern.
Why it matters for self-improvement: Shows how to design harnesses that make agents more predictable and debuggable when they fail.
Takeaway: Add explicit "pause and verify" checkpoints before high-stakes actions like spawning sub-agents or making external calls.
Relevance: ⭐⭐⭐⭐⭐
### Example 2: Tool Development
**MCP: The USB-C for AI Applications** — Geoff Huntley
What: Deep dive into Model Context Protocol as a standard for tool integration, with patterns for building composable skills.
Why it matters for self-improvement: MCP skills are more portable and composable than ad-hoc integrations.
Takeaway: When building new skills, follow MCP patterns for resource exposure and tool definition.
Relevance: ⭐⭐⭐⭐
### Example 3: Self-Evaluation
**Evaluating Language Model Agents** — Lilian Weng
What: Comprehensive framework for agent evaluation including trajectory analysis, tool use accuracy, and failure mode categorization.
Why it matters for self-improvement: Without evals, you can't know if changes actually improve performance.
Takeaway: Set up a simple regression test: save 5-10 representative tasks and re-run after skill updates.
Relevance: ⭐⭐⭐⭐⭐
### Example 4: Multi-Agent Coordination
**Patterns for Multi-Agent Systems** — Simon Willison
What: Practical patterns for agent spawning, result aggregation, and error handling in distributed agent workflows.
Why it matters for self-improvement: Shows when to spawn vs when to handle inline, and how to merge parallel results.
Takeaway: Spawn sub-agents for tasks that need isolation; keep inline for context-dependent reasoning.
Relevance: ⭐⭐⭐⭐
### Example 5: Memory Management
**Context Compaction Strategies** — arXiv
What: Techniques for managing long conversations including summarization, key-value extraction, and selective retention.
Why it matters for self-improvement: Long contexts degrade performance; smart compaction preserves what matters.
Takeaway: Before compaction, extract and save key facts to MEMORY.md; summarize the rest.
Relevance: ⭐⭐⭐⭐
## Search Queries by Category
Use these queries with `kimi_search` to find relevant content:
### Harness & System Prompts
- "system prompt engineering agent reliability"
- "agent harness design patterns"
- "prompt chaining best practices"
- "few-shot prompting agents"
### Skill & Tool Development
- "MCP server patterns"
- "AI agent tool integration"
- "skill development framework"
- "agent capabilities extension"
### Self-Evaluation
- "agent evaluation metrics"
- "LLM agent testing"
- "agent failure analysis"
- "trajectory evaluation"
### Multi-Agent Coordination
- "multi-agent orchestration"
- "agent spawning patterns"
- "distributed agent systems"
- "agent result aggregation"
### Memory & Context
- "context window management"
- "long conversation memory"
- "RAG for agents"
- "conversation summarization"
### Workflow Automation
- "agent task decomposition"
- "agent error handling"
- "retry patterns agents"
- "agent workflow design"
## Quality Indicators
**High-signal content (include):**
- Specific techniques with code examples
- Lessons from production systems
- Failure modes and how to avoid them
- Comparative analysis of approaches
- Author has built real agent systems
**Low-signal content (exclude):**
- Pure announcements without technique
- Marketing content
- General AI hype
- Ethics debates without implementation angle
- Surface-level listicles
## Setup Review Examples
### Good Example (specific, grounded, affirmative)
🔧 Setup Review
Based on today's findings:
- Let's add a `memory/experiments.md` file to track harness experiments, since the Anthropic article showed experiment logging improves iteration speed
- Let's update the channel-monitor cron to include a self-check step before responding, based on the "pause and verify" pattern from Simon Willison's post
No changes needed for multi-agent coordination — our current sub-agent spawning pattern already follows the isolation principle discussed.
### Bad Example (vague, passive)
🔧 Setup Review
Could consider maybe looking into some of the patterns mentioned. Might be worth exploring memory improvements at some point.
## Weekly Review Template
At end of week, review `memory/ai-digest-posted.json` and answer:
1. **Experiments tried:** What did we test this week?
2. **Outcomes:** What worked? What didn't?
3. **Skills evaluated:** Any new skills worth adopting?
4. **Setup changes made:** What did we change based on learnings?
5. **Source quality:** Which sources provided the most value?
6. **Adjustments:** Should we add/remove sources? Change frequency?
## Common Pitfalls to Avoid
1. **Including general news** - Stay focused on self-improvement, not announcements
2. **Vague setup reviews** - Be specific about what to change and why
3. **Skipping deduplication** - Always check posted.json first
4. **No experiment suggestion** - Always include one actionable experiment
5. **Ignoring existing setup** - Connect suggestions to current AGENTS.md, TOOLS.md, skills/