The Three Layers
Today the room sits across three layers. Some are available right now. Some are not. We work every layer we can reach, and we document the wall where the work stops. Advanced builders operate fluently across two enterprise AI tools (genai.mil and Ask Sage), picking the right tool for the right task, and using both as a cross-check on high-stakes outputs.
| Layer | What It Is | Status Today | What You Use It For |
|---|---|---|---|
| 1. Spec | genai.mil and Ask Sage. Both are CUI-authorized enterprise AI tenants. genai.mil is your fast chat layer for single-prompt work. Ask Sage is your multi-file and reasoning layer for architecture and dataset work. | Always available where the tenant grants access. If only genai.mil is available, Ask Sage phases degrade to paste-and-summarize. | Design, decompose, draft, prompt-write, peer-review your own logic. Multi-file refactor and dataset analysis go to Ask Sage. Quick single-shot drafts go to genai.mil. |
| 2. Prototype | Static HTML in a single file. Deployed to GitHub Pages or another free public host that MCEN can reach. | Always available. | Build the tool. Run it offline. Print it. Share the URL. Hand it to a Marine on a phone. |
| 3. Production | Power Apps, Dataverse, real connectors, identity, write-back to systems of record. | Often unavailable. Tenant-dependent. | Where the tool gets 10x better if access is granted. Document the wish today. |
Opening Line for the Room
"Today we run two enterprise AI tools side by side, build a real multi-component tool the unit needs, and clearly mark where we hit the wall. That wall map becomes the case for changing what we have access to."
Closing Line for the Room
"The tool is replaceable. The judgment about which tool to use, and how to verify what it tells you, is what we are actually teaching."
Agenda and Timing
| Time | Module | 201 Skills Applied |
|---|---|---|
| 0:00 to 0:30 | Module 1: Frontier Mapping for Your Domain | Frontier Recognition |
| 0:30 to 1:30 | Module 2: Complex Build (Readiness Dashboard) | All six (Context Assembly, Quality Judgment, Task Decomposition, Iterative Refinement, Workflow Integration, Frontier Recognition) |
| 1:30 to 1:40 | Break | |
| 1:40 to 2:20 | Module 3: Group Debugging | Iterative Refinement, Frontier Recognition, Context Assembly |
| 2:20 to 2:50 | Module 4: Verification Protocols and QA | Quality Judgment, Workflow Integration |
| 2:50 to 3:20 | Module 5: Teaching Methodology and Teach-Back | Context Assembly, Quality Judgment |
| 3:20 to 3:50 | Module 6: Workflow Playbook | Workflow Integration, Task Decomposition |
| 3:50 to 4:00 | Wrap and Buffer | |
| Total Course Time | 3 hours 50 minutes instruction plus 10-minute buffer. Schedule a 4-hour block. | |
Contingency Plan: If Ask Sage Is Not Available Today
If your tenant has genai.mil only, the course still runs. Pivot to:
- Run every Ask Sage phase in genai.mil with paste-and-summarize: manually paste the relevant CSV rows and ask genai.mil to reason over them. Surface the cost (token limits, no persistent context across turns) explicitly so students feel the gap.
- Add a Capability Gap Map row: "Ask Sage access is blocked for this unit." That is a real, fixable policy gap. Capture it with beneficiary count and time-savings estimate.
- Substitute a reasoning model on genai.mil where one is available: use whatever reasoning option the tenant exposes for the architecture and verification phases of Module 2.
The pedagogy holds either way. The tool-selection beats teach the same lesson when the gap is felt directly.
Module 1: Frontier Mapping for Your Domain
Duration: 30 minutes
Instructor Prerequisite Check
Before teaching this course, verify you can do all of the following without AI assistance:
- Deploy a single-file static HTML tool to github.io and verify it loads from MCEN.
- Read a fetch error or JavaScript error in browser DevTools and form a targeted fix prompt from it.
- Identify a CSS specificity collision and explain class-based resolution.
- Explain when localStorage persistence breaks (private browsing, quota exceeded, JSON parse failure on schema change).
- Explain when to escalate from genai.mil to Ask Sage (multi-file, reasoning-heavy, dataset analysis).
If you cannot do any of these, complete Course 3.5 (Platform Training, Locked Tenant Reality) before teaching this course. Students will hit errors that require hands-on experience to diagnose.
Each participant builds a frontier map for their domain. Map the frontier for each tool separately. What genai.mil chat handles well in your domain differs from what Ask Sage with a reasoning model handles well. The selection is part of the discipline. The wrong tool on the wrong task is one of the most common failure modes advanced builders hit, and it is the one most easily mistaken for an AI capability ceiling.
Tool Handles Table
Two enterprise AI tools, two columns. Fill this in for your own domain as the workshop progresses. The rows below are the starting point.
| Task | genai.mil Handles | Ask Sage Handles |
|---|---|---|
| Single-prompt drafts (counseling, awards, memos) | Yes. Fast chat is the right tool. One prompt in, one polished draft out. | Overkill. Works, but the file-ingestion overhead is not justified for one-shot drafts. |
| Multi-file refactor of a static-stack tool | Painful. Context window blows out across iterations. Common to see the model lose track of what changed across 20 turns. | Yes. Upload the file, ask Claude Opus 4.7 (or another reasoning model exposed by the tenant) for a targeted refactor with constraints stated up front. |
| Quick code snippet (regex, date math) | Yes. Bounce a single-line prompt, paste the result, verify. | Works, but slow for what should be a 10-second answer. |
| Dataset analysis with three CSVs (readiness, training, equipment) | Limited. Can do it with paste-and-summarize on small samples. Loses fidelity at scale. | Yes. Upload all three files. Use a reasoning model. The model holds schema across the conversation. |
| Frontier classification of an issue you just hit | Yes. Fast bounce. "Is this a frontier issue, a context issue, or a platform quirk?" | Works, slower turn time. |
| Architecture decision (data model, schema migration plan) | Possible for small scope. Reasoning quality drops on larger trade-off questions. | Yes. Claude Opus 4.7 or a GPT-5 reasoning variant produces the best architecture analysis the locked-tenant world currently exposes. |
| Verification cross-check (run same prompt through both, look for disagreement) | Half of the two-tool cross-check. Use for the fast side. | Half of the two-tool cross-check. Use for the deliberate side. Disagreements are flags. |
Issue Tracking Format
As you encounter specific issues during builds, log them in this format. Over time this becomes your team's institutional knowledge of where AI helps and where it doesn't, and how often the right move is switching tools rather than rewriting the prompt.
| Issue | Platform | Category | Workaround | Date |
|---|---|---|---|---|
| AI inserts a CDN script tag despite "no external dependencies" constraint | static HTML | Frontier (model-specific) | Restate constraint up front. If recurring, escalate to Ask Sage with a reasoning model. | 27 May 2026 |
| Date math breaks across DST boundary in localStorage | static HTML | Context | Tell the AI it is storing UTC ISO strings and only formatting at render time. | 27 May 2026 |
| AI iterates 47 times on multi-file refactor in genai.mil chat | static HTML | Meta-frontier (tool selection) | Switch to Ask Sage with file ingestion plus a reasoning model. Restate the goal once with the file attached. | 27 May 2026 |
Categories: Frontier (AI capability limit on this task), Platform limitation (browser, OS, hosting), Context (AI guessed wrong because prompt was insufficient), Meta-frontier (you were using the wrong tool for the job).
Deliverable: One-page Advanced Frontier Map published on the EDD site. Each student adds at least three rows specific to what bit them during the workshop.
Key Teaching Point
This map is the most valuable artifact. It prevents the 19-percentage-point performance drop from the BCG-Harvard study. When workers apply AI beyond the frontier without knowing it, quality collapses. The frontier map makes the boundary visible. The new column for advanced builders is the meta-frontier: knowing not just where the model breaks, but which tool to reach for in the first place.
Data Handling Reminder
Both genai.mil and Ask Sage are CUI-authorized enterprise tenants. CUI-level unit context (unit name, T&R references, approval chain) is acceptable. PII (real Marine names, SSNs, contact info) must be anonymized unless a PIA authorizes the data on that tool. github.io is world-public the moment the repo is public; no CUI in the file itself or in the repo history.
Module 2: Complex Build, Multi-Component System
Duration: 60 minutes
This is the most complex build in the entire reality-track curriculum. Students will build a single-file static HTML Unit Readiness Dashboard that joins three data sources, computes a readiness percentage, renders hand-rolled SVG bar charts by company, surfaces a Not-Ready table, and generates a plain-text Commander's Snapshot. The build typically takes 7 to 12 sequential prompts, deliberate switching between genai.mil and Ask Sage, and at least one error-recovery cycle. This build is a step up from anything in Course 3.5.
Instructor Note: Mode-Switching and Tool-Switching Are Both the Goal
Course 4 teaches mode-switching (centaur vs cyborg). Course 4.5 layers a second axis on top: tool-switching (genai.mil vs Ask Sage). Assess students on their decision-making process, not on output polish. Watch for students who recognize when to slow down for accuracy-critical work, AND when to escalate from fast chat to a reasoning model. Those are two distinct skills.
Build Target: Static HTML Unit Readiness Dashboard
Three JSON or CSV data sources (personnel, training, equipment) joined in JavaScript, computed readiness percentage, hand-rolled SVG bar chart by company, Not-Ready table, plain-text Commander's Snapshot. Single file. Inline CSS. Inline JavaScript. No external scripts, no CDN. localStorage persistence for the last loaded dataset so a reload does not lose state.
Reference build: builds/readiness-dashboard.html. Open it now. This is the destination. Use it to check your work, not to copy-paste from.
Phase 1: Data Architecture (Centaur, 15 minutes, Ask Sage with reasoning model)
Why centaur: Data schema errors compound. A bad join model means rework everywhere downstream. Slow down. Verify. Why Ask Sage: The reasoning quality on architecture trade-offs is materially higher with a reasoning model than with fast chat. This is a meta-frontier call. The right tool wins the next hour for you.
Upload three sample CSVs to Ask Sage: personnel, training, and equipment. Select Claude Opus 4.7 (or whatever reasoning model the tenant exposes) for the join-model proposal. Ask for an explicit join model, edge cases, and an in-memory data shape.
Instructor Checkpoint
Before students prompt, ask: "Why are we using Ask Sage with a reasoning model for this phase, not genai.mil?" Correct answer: "Because the architecture decision sets up the next hour. A reasoning model holds the schema across the conversation and catches edge cases that fast chat misses. The cost of getting this wrong is rework on every downstream phase."
You are helping me design a Unit Readiness Dashboard. I have three CSV datasets (personnel, training, equipment) I am about to upload. I need you to:
- Propose a join model that produces a readiness percentage per Marine and aggregated by company.
- Identify three edge cases that will break the naive join (orphaned EDIPIs, NULL training records, unassigned equipment).
- Recommend the in-memory data shape after parsing. Plain JS objects, no framework, single-file HTML.
Constraint: the final tool must run 100% offline in a single HTML file with no external scripts or CDN.
Verification Checkpoint: End of Phase 1
Walk the room. Every student should be able to name three edge cases out loud before they proceed. If a student cannot, send them back to Ask Sage with one more prompt: "List five more edge cases I am not thinking of." Most rooms hit orphaned EDIPIs, NULL training records, and one of: duplicate serial numbers on equipment, expired training certifications, or Marines listed in personnel but marked NJP or unfit.
Phase 2: Data Ingestion and Joining (Cyborg, 15 minutes)
Why cyborg: Ingestion is trial-and-error. Parse errors, schema drift, edge cases revealed only by running the code.
Student choice: paste-and-parse with genai.mil (Cyborg fast iteration, lower setup cost) OR upload the CSVs to Ask Sage and have it generate the join code with real schema knowledge (more deliberate, higher upfront cost but fewer rewrites). Both are valid. This is the first explicit tool-switch beat of the build, layered on top of the standard cyborg mode-switch beat.
Instructor Teaching Point: The Tool-Switch Beat Inside the Mode-Switch Beat
Name what just happened to the room. Course 4 students switch between centaur and cyborg modes. Course 4.5 students switch BOTH modes AND tools. The two axes are independent. You can be in cyborg mode on genai.mil (fast chat, fast iteration) or in cyborg mode on Ask Sage (fewer turns, larger context per turn). Make the framing explicit.
Write JavaScript that parses a JSON array of personnel records and a JSON array of training records. Join by EDIPI. Output an array of {marine, training_status} objects. Inline JS, no dependencies, single function.
INSTRUCTOR: DELIBERATE FAILURE TARGET. DO NOT WARN STUDENTS.
On Prompt 2, fast chat models commonly return one of the following without prompting for scale:
- An O(n^2)
array.find()join that visibly stalls on the sample data when scaled to 500+ records. - A
JSON.parsecall without a try/catch that crashes on the first malformed paste. - A join that silently drops orphans without flagging them, so the readiness percentage is wrong but plausible.
Let students hit it. When hands go up, debrief on "frontier or context?" The honest answer is context. The fix is to tell the AI it is getting an array of 500+ records and ask for an indexed join (build a Map keyed on EDIPI, then iterate once), and to wrap the parse in try/catch with a visible error fallback.
If the AI produces a clean join on the first try (newer or reasoning models do this more often), pivot the debrief: "Why did the AI get it right this time? What did you give it in the prompt that older instructors found their AI missed?" The teaching beat becomes prompt quality and context provision rather than AI-failure-recovery. Both versions land the frontier-vs-context lesson.
The join works on 5 sample records but stalls on the 500-record dataset. I think the find() call is the issue. Rewrite this to build a Map keyed on EDIPI from training records first, then iterate personnel once and look up by Map key. Also wrap the JSON.parse in try/catch and surface a visible inline error if parsing fails.
Students who chose the Ask Sage path upload the three CSVs once and ask for the join code with schema already understood. Their iteration count is lower, but each iteration takes longer to set up. Both paths reach the same working join by the end of Phase 2. The point is not which path is faster, but that the student can articulate why they chose theirs.
Phase 3: Visualization (Centaur, 20 minutes)
Build the readiness card, SVG bar chart by company, and Not-Ready table.
Build a horizontal bar chart of readiness by company. Company on the left axis, percentage as the bar length, value label on the right. Colors: scarlet for under 75, gold for 75 to 90, ink for above 90.
INSTRUCTOR: SECOND DELIBERATE FAILURE TARGET
Fast chat models commonly return an answer that includes a <script src="..."> pulling Chart.js from a CDN. That violates the no-external-dependencies constraint stated up front. Let the student catch the violation. If they do not, ask: "Where is that bar chart actually coming from? Open DevTools. What does the Network tab show?" If the AI honored the constraint on the first try, ask why: what about the prompt or the model's posture made it remember the constraint when other rooms have seen it forgotten?
No external scripts, no CDN, no Chart.js. Render as inline SVG using vanilla JS. Bars labeled with company name on the left and percentage on the right, scaled to 0 to 100. Use the scarlet/gold/ink palette already in the stylesheet.
Key Teaching Beat: Model Capability Difference
In practice, reasoning models (current Claude Opus, GPT-5 reasoning) tend to honor the no-external-dependencies constraint from the first response more often than fast chat models. The contrast is not absolute and depends on prompt phrasing and the current state of each model, but it is reliable enough to make the point. Reasoning models follow stated constraints more rigorously on average. Fast chat is cheaper per turn but pushes more verification onto the human. Neither is wrong. Use both, pick deliberately, and verify the output regardless of which model produced it.
Phase 4: Verification and Brief Generation (10 minutes)
Upload the generated readiness_output.csv and the original personnel_source.csv to Ask Sage. Ask it to find discrepancies. This is a real cross-check, not a spot-check.
I am uploading two files: readiness_output.csv (what my dashboard generated) and personnel_source.csv (the original data). For each Marine in readiness_output.csv flagged as NOT READY, verify the reason by cross-referencing the source. Flag any rows where the source data does not support the NOT READY classification.
Add a "Generate Commander's Snapshot" button. It produces a plain-text brief in a read-only textarea. Format: header with unit and as-of date, overall readiness percentage with delta vs last week from localStorage, breakdown by company, top three NOT READY drivers, and a one-line REQUEST at the end if any company is below 75. Add a Copy button next to it.
Final Verification Standard
Before students declare the build done, run this protocol: "Pick one Marine from the Not-Ready table. Open the source CSV in a text editor. Can you confirm from the source that this Marine should be Not-Ready, and that the reason your dashboard cited matches the source?" If they cannot, the dashboard is not ready. The Ask Sage cross-check in Phase 4 is the cleaner way to do this at scale, but the manual spot-check is the discipline.
Debrief Questions (5 minutes)
- Where did you switch MODES? Why?
- Where did you switch TOOLS? Why?
- Where did the AI fail? Was it a frontier issue, a context issue, or a tool-selection (meta-frontier) issue?
- If you had to build this again, what would you do differently?
- How much time did this take? How long would it have taken without AI?
Key Takeaway
This build required 7+ prompts, at least two mode switches, at least one tool switch, and at least one error-recovery cycle. That is normal for advanced builds. The students who succeeded verified at each phase boundary, picked the right tool for each phase, and knew how to feed errors back into the conversation for refinement.
Module 3: Group Debugging
Duration: 40 minutes
Participants bring actual broken tools. If they don't have any, use the five pre-built scenarios linked below. The five scenarios cover the most common failure modes in static-stack reality-track builds, plus one tool-selection failure unique to the two-tool world advanced builders operate in.
Debugging Clinic Protocol
- Student Presentation (2 minutes): Student explains what the tool should do, what it actually does, and what they have already tried. Format: "Expected behavior. Actual behavior. Steps I have taken."
- Group Diagnosis (3 minutes): Group asks clarifying questions and proposes hypotheses. Instructor guides with questions: "Is this a data issue or a logic issue? Is the problem at input or at output? Have we seen this pattern before? Was the right tool used for this work?"
- Instructor Synthesis (2 minutes): Instructor identifies the root cause category (frontier limitation, insufficient context, incorrect assumption, integration failure, data quality issue, or tool-selection failure) and explains the fix.
- Document the Pattern: Add the failure case to the collective Advanced Frontier Map and, where it points to a missing capability, the Capability Gap Map.
Time allocation: 7 minutes per problem. Aim for 5 problems in 35 minutes, leaving 5 minutes for final synthesis.
Instructor Note: Managing the Session
Keep time strictly. Students will want to fully fix every problem. The goal is not to fix everything; the goal is to build diagnostic patterns. After 3 minutes of group diagnosis, move to synthesis even if the problem is not solved. The value is in recognizing the pattern.
Pre-Built Debugging Scenarios
The full standalone resource with answer keys lives at resources/debugging-scenarios.html. Summaries below.
Scenario 1: Fetch Race / Double-Submit (Counseling Tracker)
A counseling tracker tool double-saves the same entry whenever the user clicks Save twice quickly. Symptom: two identical rows in localStorage, both with the same timestamp. Student attempted fixes: added a confirm dialog, added setTimeout debounce. Neither worked. View answer key.
Scenario 2: Stale localStorage After Schema Change (TEEP-like Tracker)
Tool was updated to add a "Confirmed By" field. New entries save correctly. Old entries throw a JS error when displayed because the property does not exist on legacy rows. Student rebuilt the render function three times. View answer key.
Scenario 3: CSS Specificity Collision Plus Timezone Bug (Watch-Bill Highlight Rows)
Watch-bill tool should highlight rows for the current day in gold. The wrong row highlights, and only on Mondays. Two bugs stacked: the gold class is being overridden by a more specific selector, AND the day-of-week comparison uses local time but the rendered dates use UTC. View answer key.
Scenario 4: Event Delegation on Dynamic Content (List-Based Tool)
Tool renders a list of items with Delete buttons. The first three Delete buttons work. After clicking Add and rendering four more items, only the original three buttons respond. Click handlers were attached at render time and the new items never got them. View answer key.
Scenario 5: Tool-Selection Failure (Meta-Frontier)
Student used genai.mil chat for a complex multi-file refactor of a static-stack tool. After 47 turns, the model has lost track of which files changed and is reintroducing bugs that were fixed earlier in the thread. The correct move was Ask Sage with file ingestion plus a reasoning model from the start. This is not a frontier issue with the AI. It is a meta-frontier issue with the human's tool selection. View answer key.
Final Synthesis (5 minutes)
After all debugging sessions, lead a group discussion:
- What patterns did we see? (Common answer: most problems were context issues or tool-selection issues, not frontier issues.)
- How many problems were caused by insufficient prompting versus actual AI limitations versus picking the wrong tool?
- What questions should we ask the AI differently next time?
- Which problems belong on the Advanced Frontier Map?
- Which problems point to a missing capability that belongs on the Capability Gap Map?
Key Takeaway
Most debugging comes down to four categories: (1) the AI did not have enough context, (2) the platform has a quirk the AI does not know about, (3) we hit an actual frontier limitation, or (4) we were using the wrong tool for the job. Categories 1, 2, and 4 are fixable by the human. Category 3 goes on the frontier map. The fourth category did not exist for Course 4 students. It does for Course 4.5 students, because they have two tools to pick between.
Module 4: Verification Protocols and QA
Duration: 30 minutes
Build a QA checklist for your domain. For AI-generated output, what must be checked?
QA Checklist
- Source verification. AI fabricates references. Every citation, regulation number, and URL must be independently verified.
- Data accuracy. Numbers, dates, names, and quantities must be checked against source data.
- Logic check. Does the reasoning hold? Are conclusions supported by the premises?
- Format compliance. Does the output match required formats, templates, and standards?
- Domain review. Does this pass the smell test for someone who knows this domain?
The Two-Tool Cross-Check Protocol
For high-stakes outputs (anything that goes to a CO, anything that becomes a system of record, anything that ends up in a fitness report), run the same prompt through both genai.mil and Ask Sage with a reasoning model. Compare the answers. Disagreements are flags.
- When both tools agree on a substantive judgment, your confidence increases (not to certainty, but materially).
- When the tools disagree, treat that as a flag and dig in. The disagreement is data about where the model is uncertain.
- This is not free. It costs you two prompts and a comparison turn. Reserve it for outputs that will be acted on.
- Document any case where the two tools systematically disagree. That is a frontier-map row and probably a Capability Gap Map row.
Exercise: Timed QA Review
Take an AI-generated SOP excerpt with planted errors. Run it through your checklist. Time yourself. Compare QA time to creation-from-scratch time. Usually 30 to 50 percent. That's the time savings.
AI-Generated SOP Excerpt, QA Timed Exercise
Instructions: Time yourself. How long does it take you to find all five issues? Typical completion time: 5 to 10 minutes.
STANDARD OPERATING PROCEDURE
Marine Corps Detachment, 99th Training Group
Subject: Unit Check-In / Check-Out Procedure
Reference: (a) MCO 1000.6B, Individual Records Administration
(b) NAVMC 11800/4 (Rev 03-2025), Check-In/Check-Out Sheet
1. Purpose. To establish standardized procedures for all personnel checking in to and checking out of Marine Corps Detachment, 99th Training Group. All personnel shall complete check-in within 72 hours of reporting aboard.
2. Scope. This SOP applies to all Marines, Sailors, and civilian personnel assigned to or transferring from the Detachment.
3. Procedure, Check-In. Personnel reporting aboard shall complete the following steps in order:
Step 1: Report to the Officer of the Day (OOD) with original orders and ten copies of PCS orders.
Step 2: Obtain a check-in sheet per reference (b).
Step 3: Receive unit orientation brief from S-1 covering unit organization, key personnel, and local policies.
Step 4: Report to assigned section SNCOIC/OIC for introduction and initial task assignment.
5. Report to S-1 for initial in-processing, including service record book review and page 11 entry.
6. Complete remaining check-in sheet signatures (S-3, S-4, Medical, Dental, IPAC) within 48 hours of reporting.
4. Procedure, Check-Out. Personnel transferring from the unit shall initiate check-out procedures no later than 10 working days prior to the date of detachment.
Answer Key: Five Planted Errors
- Fabricated Reference #1: "MCO 1000.6B" is cited as the governing order for check-in procedures. This MCO does not exist. AI-generated regulation numbers must always be independently verified against the official Marine Corps Publications System.
- Fabricated Reference #2: "NAVMC 11800/4 (Rev 03-2025)" is cited as the check-in/check-out form. This form number is fabricated. AI frequently generates plausible-sounding form numbers that do not correspond to real NAVMC forms.
- Data Accuracy Error, Contradictory Timelines: Paragraph 1 states check-in must be completed "within 72 hours of reporting aboard," but Step 6 states remaining signatures must be completed "within 48 hours of reporting." These timelines contradict each other. AI often introduces subtle inconsistencies between sections of longer documents.
- Logic Error, Steps Out of Order: Step 3 has the Marine receiving a "unit orientation brief from S-1," but Step 5 has the Marine reporting "to S-1 for initial in-processing." Logically, you would in-process at S-1 (Step 5) before receiving the orientation brief (Step 3). The S-1 steps are reversed.
- Format Error, Inconsistent Numbering: Steps 1 through 4 use the "Step 1:" format, but the procedure then switches to a bare "5." and "6." format midway through. AI frequently loses formatting consistency in longer documents, especially when generating numbered procedures.
Key Teaching Point
The GDPval study found human experts averaged 7 hours per task. AI-assisted with review was 1.4x faster and 1.6x cheaper at expert parity on roughly half of tasks tested. Treat human review as the discipline that captures value; the two-tool cross-check is one of the cheapest ways to raise the floor on review quality without adding human review time.
Module 5: Teaching Methodology, The 201 Multiplier
Duration: 30 minutes
The Permission Gap
Mollick's research shows workers already using AI but hiding it. Worried about organizational reaction. This creates a shadow AI culture where best practices are not shared. Your role as an Advanced Workshop graduate is to formalize AI use, share techniques, and train others. The reality-track version of that role includes teaching tool selection: when to reach for genai.mil, when to reach for Ask Sage, and when to cross-check both.
Discussion: The Apprentice Problem (5 minutes)
Entry-level job postings in AI-exposed roles dropped roughly 35 percent from 2023 to 2025 (Mollick et al. analysis of postings data; verify against the current paper before quoting in a brief). AI is automating the routine tasks that juniors traditionally learned on. If a junior never manually writes a key document because AI generates it, how do they develop the judgment to know when the AI-generated version is wrong? And how do they learn which tool to reach for when the task gets harder?
Protocol for Junior Marines Using AI
- Require review and explanation. Juniors must review AI output and explain WHY it is correct or incorrect.
- Periodically work without AI. Key tasks should periodically be done from scratch to build foundational skills.
- Use AI output as a teaching tool. Give juniors AI-generated products and have them find the problems.
- Rotate QA review. Assign juniors to the QA review step so they develop quality judgment through repeated exposure.
- Teach tool selection deliberately. When a junior asks for help, ask them which tool they reached for first and why. Coach the selection, not just the prompt.
Structured Teach-Back Exercise (20 minutes)
This exercise develops your ability to teach EDD concepts to others. Each student will prepare and deliver a 3-minute teaching segment on one concept from the EDD curriculum.
Step 1: Select a Concept (2 minutes)
Choose one of the following concepts from the curriculum:
- Centaur vs Cyborg modes
- Frontier mapping
- Context-building in prompts
- Iterative refinement
- Verification protocols
- The Jagged Frontier
- Layer separation (Spec / Prototype / Production)
- Tool selection (genai.mil vs Ask Sage)
Step 2: Preparation Using Template (5 minutes)
Use this template to prepare your teaching segment:
Teach-Back Preparation Template
Concept: Which concept are you teaching?
One-Sentence Definition: Define the concept in one clear sentence.
Why It Matters: One sentence explaining why this concept is important.
Real Example from Your Work: Specific example from your actual job where this concept applies.
Common Mistake: One mistake people make when applying this concept.
Key Takeaway: One sentence the audience should remember.
Instructor Note: Template Modeling
Before students prepare, model the template yourself with a completed example. Show them what "good" looks like. Example: "I am teaching Tool selection. Definition: genai.mil is the fast chat layer for single-prompt tasks; Ask Sage is the multi-file and reasoning layer for architecture and dataset tasks. Why it matters: picking the wrong tool burns hours and looks like a frontier issue when it is really a meta-frontier issue. My example: when I refactored a static-stack dashboard across three files, I tried genai.mil first, hit 30 turns of context drift, switched to Ask Sage with the files uploaded, and shipped in two turns. Common mistake: defaulting to whichever tool you opened first. Key takeaway: the selection is part of the discipline, not an implementation detail."
Step 3: Small Group Teaching (10 minutes)
Break into groups of 3 or 4. Each person delivers their 3-minute teaching segment. Rotate until everyone has taught.
Step 4: Peer Evaluation (3 minutes)
After each teaching segment, group members provide feedback using this rubric:
| Criteria | Strong | Needs Improvement |
|---|---|---|
| Clarity of Definition | I could explain this concept to someone else now | I am still unclear on what this concept means |
| Relevance of Example | The example made the concept concrete and believable | The example was generic or did not clearly illustrate the concept |
| Practical Takeaway | I know what to do differently because of this teaching | I understand the concept but do not know how to apply it |
Facilitating the Exercise
Walk the room during small group teaching. Listen for common issues: (1) Students who read from their template instead of teaching conversationally, (2) Examples that are too vague ("I used this on a project" instead of specific details), (3) Students who exceed 3 minutes (this is a teaching skill: brevity). Give real-time coaching. The goal is not perfect teaching; the goal is building awareness of what effective teaching looks like.
Group Debrief (5 minutes)
Reconvene as a full group. Discuss:
- What made a teaching segment effective?
- What was difficult about teaching something you know well?
- How would you adapt this approach to teach a 30-minute Platform Training session?
- Who in your unit could benefit from learning these concepts?
Module 6: Workflow Playbook
Duration: 30 minutes
Each participant produces a one-page playbook for one AI-integrated workflow from their actual job. This is the final deliverable of the Advanced Workshop. The reality-track version adds a "Which tool" column to the template, because tool selection is part of the workflow design.
Completed Example 1: Weekly Training Schedule Publication
Use this filled-in example to show participants what a finished playbook looks like before they create their own.
| Field | Content |
|---|---|
| Task | Weekly training schedule publication for the section |
| Frequency | Weekly. Every Thursday by 1600. |
| Mode | Cyborg (continuous back-and-forth refinement) |
| Which Tool | genai.mil chat |
| Step 1 | Human: Pull next week's events from training calendar, OPORD, and any new taskings (Human Only). |
| Step 2 | AI: Draft the schedule in standard weekly format with times, locations, and uniform requirements (AI generates, Human reviews). |
| Step 3 | Human: Cross-reference against range bookings, vehicle requests, and instructor availability (Human Only). |
| Step 4 | AI: Format conflicts as a decision matrix. "Event A conflicts with Event B at 0900. Options: move A to 1300, move B to Tuesday, or split the section." (AI generates options, Human decides.) |
| Step 5 | Human: Make final decisions on conflicts, add section leader notes (Human Only). |
| Step 6 | AI: Generate the final formatted schedule with all corrections applied, ready for distribution (AI generates, Human approves). |
| Verification Checklist | All events have confirmed locations. All times are in 24-hour format. No double-bookings remain. Uniform for each event is specified. POC listed for each event. |
| Known Frontier Issues | AI sometimes invents room numbers that do not exist on base. AI cannot verify range availability. Must be checked manually. AI occasionally uses 12-hour time format even when told to use 24-hour. |
| Time Savings | Without AI: about 3 hours (gathering info, formatting, resolving conflicts manually). With AI: about 45 minutes (human gathers info, AI formats and generates options). |
| Junior Development | Rotate schedule duty among junior Marines weekly. Require the Marine to review AI output and brief back why each event is scheduled (builds planning judgment). Monthly: have one schedule created entirely without AI to maintain baseline skill. |
Completed Example 2: Weekly Readiness Rollup
Same template, different tool selection. This is the multi-file dataset workflow that the Module 2 build prepared students for.
| Field | Content |
|---|---|
| Task | Weekly readiness rollup for the company commander's brief |
| Frequency | Weekly. Every Monday by 1000. |
| Mode | Centaur (distinct human phases and AI phases with verification at each boundary) |
| Which Tool | Ask Sage with Claude Opus 4.7 (or another reasoning model the tenant exposes) |
| Step 1 | Human: Export current rosters from MOL: personnel, training currency, equipment status (Human Only). |
| Step 2 | AI: Upload all three CSVs to Ask Sage. Ask Claude Opus 4.7 to compute deltas vs last week's saved snapshot. Output the top three drivers of any change in readiness percentage (AI generates, Human reviews). |
| Step 3 | Human: Spot-check three Marines flagged as new NOT READY this week against the source CSVs to verify the AI's attribution (Human Only). |
| Step 4 | AI: Generate a plain-text rollup with overall percentage, delta, top three drivers, and a one-line recommended commander action (AI generates, Human approves). |
| Step 5 | Human: Review, paste into the brief slide, log to the readiness dashboard tool for next week's delta (Human Only). |
| Verification Checklist | Overall percentage matches manual count of READY Marines divided by total. Delta sign and magnitude check against last week's saved snapshot. Top three drivers each trace back to a specific source row. No Marine appears in the rollup who is not in this week's personnel export. |
| Known Frontier Issues | Ask Sage occasionally misattributes a NOT READY classification when a Marine appears in two units (cross-attached). The two-tool cross-check (run the same prompt through genai.mil with paste-and-summarize on a sample) catches this. Reasoning models occasionally over-explain. Constrain the output format up front. |
| Time Savings | Without AI: about 4 hours (manual joins in Excel, three sets of pivot tables, narrative writing). With AI: about 1 hour. Net savings: 3 hours per week per company commander's S-3 shop. |
| Junior Development | Have a junior Marine do Step 3 (the source-data spot-check) every week. That step builds the readiness-data literacy the rest of the workflow assumes. Monthly: have the junior do the full rollup without AI for one company to maintain baseline skill. |
Blank Template, Create Your Own
| Field | Content |
|---|---|
| Task | A specific, recurring task from your job |
| Frequency | How often you perform this task |
| Mode | Centaur or Cyborg |
| Which Tool | genai.mil chat / Ask Sage with reasoning model / both (two-tool cross-check) |
| Steps | Step-by-step process with Human/AI labels for each step |
| Verification Checklist | What must be checked before output is final |
| Known Frontier Issues | Where AI has failed on this task before |
| Time Savings Estimate | Time without AI vs time with AI |
| Junior Development Note | How this workflow preserves skill-building for junior personnel |
Completion Criteria: What a Finished Playbook Looks Like
A completed playbook entry should have: a clear task name, realistic frequency, the correct mode (Centaur or Cyborg), the correct tool (genai.mil / Ask Sage / both), 4 to 8 concrete steps with Human/AI labels, a verification checklist with 3 to 5 items, at least one known frontier issue, and a specific time savings estimate. If your playbook has fewer than 4 steps or no verification checklist, it is not detailed enough.
Capability Gap Map Integration
Add at least two rows to your Capability Gap Map during Module 6 that map your workflow playbook to Production-Layer wishes. Examples from the readiness rollup workflow: "MOL write-back so the snapshot posts back to the system of record," "Automated brief delivery once the rollup is approved," "Direct connector to the training currency database so the manual export is eliminated."
Open the Capability Gap Map template and fill in the rows now, while the workflow design is fresh. The compiled Capability Gap Map across the room is the policy case for production access. Specific, costed, beneficiary-counted gaps are what move tenant decisions.
Assessment Rubric
Use this rubric to evaluate student performance across all modules. Students should achieve "Meets" or higher in at least 5 of 6 categories to be considered Course 4.5 graduates.
| Criteria | Exceeds Expectations | Meets Expectations | Developing |
|---|---|---|---|
| Frontier Map Completeness (Module 1) |
Frontier map covers 5+ tasks with specific examples of what each tool handles, what it fails at, and what is changing. Map includes evidence from student's own testing. Meta-frontier (tool-selection) issues are explicitly documented. | Frontier map covers 3 to 4 tasks with clear boundaries. Examples are specific to the student's domain. Distinguishes between genai.mil and Ask Sage strengths accurately. | Frontier map is generic or vague. Tasks are not specific to student's domain. Does not distinguish between the two tools or relies on assumptions rather than testing. |
| Complex Build Quality (Module 2) |
Successfully completed the Readiness Dashboard as a single-file static HTML build with all three data sources joined and the SVG bar chart hand-rolled. Made conscious mode-switching AND tool-switching decisions and articulated why each phase required that mode and that tool. Verified outputs at phase boundaries. Numbers are accurate when spot-checked against source data. | Completed most of the dashboard build as single-file static HTML. Made at least 2 mode switches and at least 1 tool switch with clear rationale. Attempted verification even if errors were found. Understands the difference between centaur and cyborg modes in practice. | Did not complete the build OR stayed in one mode and one tool throughout OR could not articulate why mode-switching or tool-switching matters. Did not verify outputs. Build contains obvious errors not caught in QA. |
| Debugging Contribution (Module 3) |
Actively diagnosed problems during group debugging. Asked clarifying questions that narrowed down root cause. Correctly identified whether failures were frontier, context, platform, or meta-frontier (tool selection) issues. Contributed patterns to the collective frontier map. | Participated in group debugging. Asked questions and proposed solutions. Could distinguish between AI limitations, prompting issues, and tool-selection issues when guided by instructor. Documented at least one failure pattern. | Did not actively participate in debugging OR could not identify root causes OR attributed all problems to "AI limitations" without deeper analysis. Did not document patterns. |
| QA Protocol Rigor (Module 4) |
Identified all 5 errors in the QA exercise in under 10 minutes. Explained why each error is dangerous. Applied the QA checklist plus the two-tool cross-check to own work and found at least one issue. QA process is systematic and repeatable. | Identified 3 to 4 errors in the QA exercise. Understands the importance of verification and the two-tool cross-check. Can articulate what must be checked in AI-generated output for their domain. | Identified fewer than 3 errors OR took longer than 15 minutes OR did not apply QA thinking to own work. Treats QA as optional rather than critical. |
| Teaching Effectiveness (Module 5) |
Delivered a clear, concise teach-back with a specific real-world example. Stayed within 3 minutes. Explained not just what the concept is, but why it matters and how to apply it. Received "Strong" ratings on all three rubric criteria from peers. | Delivered a teach-back that communicated the concept clearly. Used a real example. Stayed close to time limit. Received "Strong" on at least 2 of 3 rubric criteria from peers. | Teach-back was unclear, too long, or relied on generic examples. Could not explain how to apply the concept. Did not receive "Strong" on any rubric criteria from peers. |
| Workflow Playbook Completeness (Module 6) |
Workflow playbook covers a real, recurring task with specific step-by-step details. Each step is labeled Human/AI with rationale. Which Tool column is filled with a justified selection. Verification checklist is thorough and testable. Known frontier issues are documented from actual experience. Time savings estimate is backed by real data. Junior development protocol is specific and actionable. At least two Capability Gap Map rows added. | Workflow playbook covers a real task. Steps are clear. Which Tool column is filled. Verification checklist is present. Frontier issues are identified. Time savings estimate is reasonable. Junior development note is included. At least one Capability Gap Map row added. | Workflow playbook is incomplete or generic. Steps are vague. Which Tool column missing or unjustified. Verification checklist is missing or not specific. No documented frontier issues or time savings. No Capability Gap Map rows added. |
Using This Rubric
This rubric is not a checklist. Use it as a guide for observing student performance throughout the workshop. The most important indicator of success is whether students demonstrate conscious decision-making about when and how to use AI, and which AI to reach for. A student who completes all deliverables but cannot articulate their reasoning has not achieved the learning objectives. Conversely, a student who struggles with technical execution but shows strong diagnostic thinking, mode-switching awareness, and tool-selection awareness is on the right track.
Certification Recommendation
Students who achieve "Meets" or higher in at least 5 of 6 categories are recommended for:
- Serving as Platform Training (Locked Tenant Reality) instructors
- Leading tool development projects in their units
- Mentoring junior personnel in AI-assisted workflows
- Contributing to frontier map updates, workflow documentation, and Capability Gap Map rows
Students who do not meet this threshold should be encouraged to continue practicing, revisit specific modules, and attempt Course 4.5 again after building 1 to 2 additional tools.
Wrap and Next Steps
Course 4.5 produces two deliverables that compound over time. The Advanced Frontier Map captures where AI is reliable and where it is not, broken down by tool. The Capability Gap Map captures where the tenant blocks the work. Both go to leadership. Both compound across sessions.
Artifact 1: Advanced Frontier Map
Compile all failure cases from the session. This is collective intelligence of the room. Each student adds at least three rows specific to what bit them today. Specific task, specific failure, specific verification action. Specific tool. Not "check the output," but "run the same prompt through both tools and look for disagreement on the company-by-company readiness percentages."
Artifact 2: Capability Gap Map (Compounded)
Every tool the room built today has a wish list. Things that would make it 10x better if production access were unlocked. Capture them. The compiled Capability Gap Map across the room is the case for changing what the tenant gives you. Forward your unit's compiled Capability Gap Map up through your AI POC.
Assignment Before the Next Workshop
- Polish the Readiness Dashboard build to a deployable state. Keep the URL live.
- Run through the EDD SOP QA process on at least one of your reality-track builds, using the two-tool cross-check.
- Document three failure cases with specifics: what failed, how you caught it, how you fixed it, which tool you used.
- Add at least two rows to your unit's Capability Gap Map based on the limitations you ran into.
- Identify one area where AI capability surprised you. Something it did better than you expected, on either tool.