|
| 1 | +# Assessment: Orchestration & Prompts for MCP Content Mining |
| 2 | + |
| 3 | +## Context & Requirements Synthesis |
| 4 | + |
| 5 | +### User Priorities Identified |
| 6 | +1. **Two Primary Mining Modes**: |
| 7 | + - **Greenfield content mining** - Starting from scratch with new source material |
| 8 | + - **Diff content mining** - Analyzing changes/updates to existing content |
| 9 | +2. **Rough ELO estimation** - Platform handles real-time convergence, agents provide sane initial sorting |
| 10 | +3. **Leverage existing patterns** from `@packages/edit-ui/src/components/BulkImportView.vue` |
| 11 | +4. **Atomic tool composition** - Agents orchestrate using create_card, update_card, etc. |
| 12 | +5. **Flexible guidance** - Prompts suggest strategies without rigid constraints |
| 13 | + |
| 14 | +### Existing Bulk Import Knowledge |
| 15 | +From `/agent/mcp/prompt-bulk.md`, the platform already has: |
| 16 | +- **Detailed fillIn formatting rules** - `{{answer}}`, `{{multiple|options}}`, `{{correct||distractor1|distractor2}}` |
| 17 | +- **ELO calibration guidelines** - 100-500 beginner, 500-1000 early, 1000-1500 intermediate, etc. |
| 18 | +- **Tag formatting conventions** - no spaces, dots/hyphens allowed, comma-separated |
| 19 | +- **Card separation syntax** - double `---` lines |
| 20 | +- **Markdown support** - code blocks, formatting within questions and answers |
| 21 | +- **Content best practices** - concise answers, clear questions, code examples |
| 22 | + |
| 23 | +## MCP Prompts vs Tools Analysis |
| 24 | + |
| 25 | +### Why Prompts Are Superior for Content Mining |
| 26 | + |
| 27 | +**MCP Prompts provide:** |
| 28 | +- ✅ **Expert guidance** without execution constraints |
| 29 | +- ✅ **Parameter customization** for different contexts |
| 30 | +- ✅ **Agent autonomy** - can adapt, combine, or ignore advice |
| 31 | +- ✅ **Composability** - work with agent's own intelligence |
| 32 | +- ✅ **Maintainability** - easy to update expertise without code changes |
| 33 | + |
| 34 | +**Compared to orchestrator tools which would:** |
| 35 | +- ❌ **Lock in rigid workflows** that may not fit all scenarios |
| 36 | +- ❌ **Hide decision-making** from agent reasoning |
| 37 | +- ❌ **Create debugging complexity** with nested tool calls |
| 38 | +- ❌ **Limit creativity** with predefined patterns |
| 39 | + |
| 40 | +### Agent Decision Process for Prompts |
| 41 | + |
| 42 | +**Agent prompt selection workflow:** |
| 43 | +1. **User request analysis** - "Generate quizzes from this Go codebase" |
| 44 | +2. **Available prompts discovery** - Lists MCP prompts via protocol |
| 45 | +3. **Context matching** - Identifies this as greenfield mining scenario |
| 46 | +4. **Prompt invocation** - Calls `greenfield_content_mining` with repo context |
| 47 | +5. **Guidance integration** - Synthesizes prompt advice with own reasoning |
| 48 | +6. **Tool orchestration** - Uses create_card, tag_card atomically following strategy |
| 49 | + |
| 50 | +## Proposed MCP Prompt Architecture |
| 51 | + |
| 52 | +### Core Prompt Templates (4 Essential) |
| 53 | + |
| 54 | +#### 1. `greenfield_content_mining` |
| 55 | +**Purpose:** Guide systematic analysis of new source material for comprehensive courseware creation |
| 56 | + |
| 57 | +**Parameters:** |
| 58 | +- `sourceType` - "codebase", "documentation", "tutorials", "papers" |
| 59 | +- `domain` - "programming", "math", "science", etc. |
| 60 | +- `targetAudience` - "beginner", "intermediate", "advanced" |
| 61 | +- `scopeConstraint` - "single-file", "module", "full-project" |
| 62 | + |
| 63 | +**Template provides:** |
| 64 | +- **Repository structure analysis** - How to identify key concepts vs implementation details |
| 65 | +- **Content prioritization** - Core concepts first, edge cases later |
| 66 | +- **Knowledge dependency mapping** - Prerequisites and logical sequencing |
| 67 | +- **Coverage strategies** - Breadth vs depth decisions |
| 68 | +- **ELO estimation guidance** - Initial difficulty assessment based on concept complexity |
| 69 | + |
| 70 | +#### 2. `diff_content_mining` |
| 71 | +**Purpose:** Guide analysis of changes/updates to identify new quiz opportunities |
| 72 | + |
| 73 | +**Parameters:** |
| 74 | +- `changeType` - "feature-addition", "refactor", "bug-fix", "docs-update" |
| 75 | +- `diffScope` - "lines-changed", "files-affected", "concepts-modified" |
| 76 | +- `existingCoverage` - "sparse", "moderate", "comprehensive" |
| 77 | + |
| 78 | +**Template provides:** |
| 79 | +- **Change impact analysis** - What concepts are newly introduced vs modified |
| 80 | +- **Gap identification** - What existing cards need updates vs new creation |
| 81 | +- **Incremental strategies** - How to build on existing content |
| 82 | +- **Deprecation handling** - Managing outdated concepts |
| 83 | +- **Version tracking** - Linking content to specific code versions |
| 84 | + |
| 85 | +#### 3. `fillIn_question_crafting` |
| 86 | +**Purpose:** Guide creation of effective fill-in-the-blank questions using Vue-Skuilder syntax |
| 87 | + |
| 88 | +**Parameters:** |
| 89 | +- `contentType` - "code", "theory", "api-usage", "troubleshooting" |
| 90 | +- `cognitiveLevel` - "recall", "application", "analysis", "synthesis" |
| 91 | +- `answerComplexity` - "single-word", "short-phrase", "code-snippet" |
| 92 | + |
| 93 | +**Template provides:** |
| 94 | +- **Format patterns** from bulk import experience: |
| 95 | + - `{{simple_answer}}` for direct recall |
| 96 | + - `{{option1|option2}}` for multiple acceptable answers |
| 97 | + - `{{correct||distractor1|distractor2}}` for multiple choice |
| 98 | +- **Code integration best practices** - Balancing context with focused questions |
| 99 | +- **Answer entropy management** - Keeping answers matchable while meaningful |
| 100 | +- **Distractor selection** - Common mistakes and plausible alternatives |
| 101 | + |
| 102 | +#### 4. `elo_rough_calibration` |
| 103 | +**Purpose:** Guide initial ELO estimation knowing real-world performance will refine |
| 104 | + |
| 105 | +**Parameters:** |
| 106 | +- `conceptComplexity` - "fundamental", "applied", "nuanced", "edge-case" |
| 107 | +- `prerequisiteDepth` - "none", "basic", "substantial", "advanced" |
| 108 | +- `cognitiveLoad` - "recall", "comprehension", "application", "analysis" |
| 109 | + |
| 110 | +**Template provides:** |
| 111 | +- **Initial ELO bands** refined from bulk import guidelines: |
| 112 | + - 100-500: Basic vocabulary, simple facts |
| 113 | + - 500-1000: Applied knowledge, straightforward procedures |
| 114 | + - 1000-1500: Integration concepts, moderate problem-solving |
| 115 | + - 1500-2000: Complex applications, multi-step reasoning |
| 116 | + - 2000+: Expert insights, edge cases, advanced synthesis |
| 117 | +- **Comparative benchmarking** - How to assess relative to existing content |
| 118 | +- **Uncertainty handling** - When to err conservative vs aggressive |
| 119 | +- **Refinement expectations** - Understanding platform will auto-adjust |
| 120 | + |
| 121 | +## Prompt Implementation Strategy |
| 122 | + |
| 123 | +### MCP Technical Integration |
| 124 | + |
| 125 | +**Prompt registration pattern:** |
| 126 | +```typescript |
| 127 | +this.mcpServer.registerPrompt( |
| 128 | + 'greenfield_content_mining', |
| 129 | + { |
| 130 | + title: 'Greenfield Content Mining Strategy', |
| 131 | + description: 'Systematic approach for mining new source material', |
| 132 | + arguments: [ |
| 133 | + { name: 'sourceType', description: 'Type of source material being analyzed' }, |
| 134 | + { name: 'domain', description: 'Subject domain of the content' }, |
| 135 | + { name: 'targetAudience', description: 'Intended learning audience level' }, |
| 136 | + { name: 'scopeConstraint', description: 'Scope of analysis (file, module, project)' } |
| 137 | + ] |
| 138 | + }, |
| 139 | + async (args) => ({ |
| 140 | + messages: [ |
| 141 | + { |
| 142 | + role: 'user', |
| 143 | + content: generateGreenfield ContentMiningPrompt(args) |
| 144 | + } |
| 145 | + ] |
| 146 | + }) |
| 147 | +); |
| 148 | +``` |
| 149 | + |
| 150 | +### Content Strategy |
| 151 | + |
| 152 | +**Prompts should contain:** |
| 153 | +- ✅ **Structured methodologies** - Step-by-step approaches |
| 154 | +- ✅ **Decision frameworks** - When to choose different strategies |
| 155 | +- ✅ **Quality criteria** - What makes good vs poor questions |
| 156 | +- ✅ **Practical examples** - Concrete patterns for different content types |
| 157 | +- ✅ **Vue-Skuilder specific** - Leverage platform features and conventions |
| 158 | + |
| 159 | +**Prompts should NOT contain:** |
| 160 | +- ❌ **Rigid scripts** - Agents need flexibility |
| 161 | +- ❌ **Implementation details** - Keep focused on strategy |
| 162 | +- ❌ **Tool invocation** - Agents decide when to use create_card, etc. |
| 163 | +- ❌ **Fixed sequences** - Allow for creative orchestration |
| 164 | + |
| 165 | +## Organizational Benefits |
| 166 | + |
| 167 | +### Separation of Concerns |
| 168 | +- **Prompts**: Domain expertise and strategy guidance |
| 169 | +- **Agent**: Intelligence, creativity, and orchestration |
| 170 | +- **Tools**: Atomic operations (create, update, tag, delete) |
| 171 | +- **Platform**: Real-time ELO convergence and performance tracking |
| 172 | + |
| 173 | +### Maintainability |
| 174 | +- **Prompt updates** don't require code changes |
| 175 | +- **Strategy evolution** can be managed independently |
| 176 | +- **Domain expertise** centralized and reusable |
| 177 | +- **Testing simplified** - prompts testable in isolation |
| 178 | + |
| 179 | +### Scalability |
| 180 | +- **New domains** just need new prompt templates |
| 181 | +- **Different agent types** can use same prompt guidance |
| 182 | +- **Cross-platform portability** via MCP standard |
| 183 | +- **Expertise sharing** across agent implementations |
| 184 | + |
| 185 | +## Recommendation |
| 186 | + |
| 187 | +Implement the 4 core prompt templates as Phase 2.2, replacing the complex orchestrator tool approach. This provides the perfect balance of expert guidance with agent autonomy, leveraging existing Vue-Skuilder patterns while maintaining full flexibility for creative content mining strategies. |
| 188 | + |
| 189 | +The agent can then choose appropriate prompts based on context, synthesize the guidance with their own reasoning, and execute via the atomic tools we've already implemented. |
0 commit comments