How It Works
The 5-stage pipeline that turns books into verified agent skills.
BookForge uses a 5-stage pipeline to convert non-fiction books into structured, verified agent skills. Each stage has a specific job, and the output of one feeds directly into the next.
Pipeline Overview
┌──────────┐ ┌─────────────┐ ┌────────────┐ ┌──────────┐ ┌──────────┐
│ Extract │───→│ Decompose │───→│ Synthesize │───→│ Verify │───→│ Optimize │
│ │ │ │ │ │ │ │ │ │
│ Book → │ │ Score → │ │ Generate │ │ Test │ │ Tune │
│ chapters │ │ skill units │ │ SKILL.md │ │ output │ │ triggers │
└──────────┘ └─────────────┘ └────────────┘ └──────────┘ └──────────┘Stage 1: Extract
Parse the book into structured chapters and sections. The extractor handles PDF, EPUB, and other common formats, producing a normalized representation of the book's content hierarchy.
Input: A book file (PDF, EPUB) Output: Structured chapter/section tree with text content
Stage 2: Decompose
Not every section of a book makes a good skill. The decomposer scores each topic for "skill density" using a 1-5 rubric across six dimensions:
- Skill Density — how much actionable procedure does this section contain?
- Digital Actionability — can an AI agent actually do this, or is it purely physical/social?
- Output Verifiability — can you check whether the skill was applied correctly?
- Trigger Clarity — is it obvious when this skill should activate?
- Reuse Frequency — will this come up often enough to justify a skill?
- Composability — does this work well alongside other skills?
Topics scoring below 3 on any dimension are filtered out. The remaining topics are grouped into skill units — coherent bundles of knowledge that belong together.
Input: Structured chapter/section tree Output: Scored and grouped skill units (threshold: total score 18+)
Stage 3: Synthesize
Each skill unit becomes a SKILL.md file. The synthesizer generates:
- Frontmatter with a description tuned for agent triggering
- A structured body with When to Use, Checklist, Process, Key Principles, and Examples
- WHY reasoning for every step — not just what to do, but why it matters
- Scripts for automatable tasks
- Reference files for deep-dive material
The synthesizer generalizes terminology. Author-specific jargon is replaced with domain-standard language so the skill works for anyone, not just readers of that specific book.
Input: Skill units with source content Output: Complete SKILL.md packages (body, scripts, references)
Stage 4: Verify
Every skill goes through two kinds of testing:
Structural checks — does the SKILL.md conform to the spec? Are all required sections present? Is the body under 500 lines? Do script references resolve?
Functional testing — the skill is tested against real tasks using a with-skill vs without-skill baseline comparison. An agent attempts the same task twice: once with the skill installed, once without. The outputs are compared to measure whether the skill actually improves results.
Input: Generated SKILL.md packages Output: Verified skills with test results
Stage 5: Optimize
The final stage tunes two things:
Description optimization — the frontmatter description determines when an agent triggers the skill. BookForge runs 20 evaluation queries against each skill's description to measure trigger accuracy: does the skill activate when it should, and stay quiet when it shouldn't?
Frontmatter tuning — model recommendations, context window requirements, and allowed-tools lists are calibrated based on the skill's complexity and tool needs.
Input: Verified skills Output: Production-ready skills with optimized triggering
Why This Matters
The bottleneck in the agent skills ecosystem is authorship. Today, only developers who deeply understand both a domain and agent tooling can write effective skills. That limits the entire ecosystem to what a small group of people find time to build.
BookForge removes that bottleneck. The world's non-fiction knowledge — negotiation tactics, architecture patterns, management frameworks, scientific methods — is already written down in books. BookForge distills it into a format agents can use directly.
The result: agent capabilities grow at the speed of book processing, not at the speed of manual skill authoring.