Prompting for L&D: A Framework That Works

We've all been there. We open Claude or Gemini, type something like "write me a learning objective for a compliance course," and get back something so generic it could apply to any company, any audience, any decade. Then we spend 40 minutes rewriting what was supposed to save us time. What went wrong?

The advice we keep hearing doesn't help much either. "Be specific!" "Give context!" "Iterate!" That's like telling someone to cook better by saying "use ingredients." It's technically true and practically useless. What does "specific" even mean when we're trying to build a scenario-based assessment for mid-level managers learning to handle escalation calls?

I keep seeing the same question in L&D forums and ATD communities: "I keep hearing I need to learn prompt engineering, but where do I actually start as an instructional designer, not a developer?" That question is more honest than most of the answers it gets. Because the answers usually come from tech people who've never written a Mager-style objective or fought a SCORM packaging deadline.

Here's what I've been wrestling with after writing complete guides for Claude AI (15 parts), Google Gemini (27 parts), and Microsoft 365 Copilot (37 parts): the problem isn't that we're bad at prompting. The problem is that generic prompting frameworks weren't built for instructional design work. And the prompts we wrote six months ago? They probably need rewriting too, because the models underneath them have changed in ways that break old approaches. We can fix that, but it requires treating prompting as a living discipline rather than a one-time workshop.

TL;DR

Generic prompting advice fails L&D professionals because instructional design requires domain-specific constraints (Bloom's levels, audience analysis, Kirkpatrick alignment) that general frameworks ignore. Josh Bersin's February 2026 research shows fewer than 5% of companies have deployed AI-native technology despite a $400 billion market.

Anthropic's 4D AI Fluency Framework (Delegation, Description, Discernment, Diligence) provides the foundation. The RCTCO framework (Role + Context + Task + Constraints + Output Format) fits within the "Description" competency, providing L&D teams with a structured approach to this critical skill.

Every model refresh changes how prompts should be structured. Claude Opus 4.6, GPT-5, and Gemini 3 each require different formatting than their predecessors. Prompt libraries built in late 2025 are already outdated.

Four copy-paste-ready prompt templates for learning objectives, assessments, scenarios, and evaluation rubrics, with tool-specific adjustments for Claude Opus 4.6, Gemini 3 Pro, and Microsoft 365 Copilot.

The RCTCO framework survives model changes because it encodes structure and intent rather than model-specific tricks.

🎯 Why Generic Prompting Advice Fails Us

Let's be honest about something. When ATD Research surveyed 232 instructional designers in July 2025 about their AI usage, the biggest gap wasn't access to tools. It was knowing how to use them for the specialized work we do. Dr. Philippa Hardman put it bluntly: "Without a user who can bring significant expertise in optimal instructional design practices AND understanding of how LLMs work, the value from generic AI models is severely limited."

Does that resonate with anyone else? It certainly does with me.

A related question I see constantly: "What's the minimum I need to know about prompting to actually be useful in my day-to-day ID work?" Here's my honest answer: less than the prompt engineering influencers suggest, but more than "just type a question." The minimum is understanding how to give a model the same context we'd give a new contractor. Role, audience, deliverable, constraints. That's the floor. Everything else builds from there.

Think about what makes our work different from general content creation. We're not just writing. We're aligning to Bloom's taxonomy levels. We're mapping to Kirkpatrick's evaluation frameworks. We're navigating SME bottlenecks and compliance requirements, SCORM packaging constraints, and accessibility standards. When was the last time a generic prompting guide mentioned any of that?

Recent research from SAGE Journals (2025) found that AI-generated questions align well with lower-order cognitive tasks, such as Remember and Understand, but struggle significantly with higher-order thinking, such as Analyze, Evaluate, and Create. That's a problem, right? Because most of us aren't building training programs that stop at recall. We need an application. We need synthesis. And getting there requires a different kind of prompt.

🏛️ The Foundation: Anthropic's 4D AI Fluency Framework

Before we dive into prompting mechanics, we need to zoom out. What does it even mean to be "fluent" with AI? Anthropic released a free course through Anthropic Academy (12 lessons, CC BY-NC-SA license) developed with Prof. Rick Dakan and Prof. Joseph Feller that frames AI fluency around four competencies they call the 4Ds:

Delegation is about setting goals and deciding when and how to engage with AI. Should we use Claude to draft this branching scenario, or is the subject matter sensitive enough that a human-first approach makes more sense? Not every task benefits from AI. How often do we reach for it out of habit rather than strategic intent?

Description is effectively describing our goals to prompt useful AI behaviors and outputs. This is where most of the "prompting advice" conversation lives, and it's where the RCTCO framework fits. But notice that it's only one of four competencies. Are we over-indexing on this one while neglecting the others?

Discernment is the accurate assessment of the usefulness of AI outputs. That SAGE Journals finding about AI struggling with higher-order Bloom's levels? Discernment is the skill that catches that gap before it ships to learners. Can we reliably tell the difference between a scenario that tests Application and one that only tests Recall wrapped in a narrative?

Diligence is taking responsibility for what we do with AI. The output has our name on it. The learners don't know (or care) whether a human or a model wrote the assessment. If the branching scenario reinforces a misconception from the training data, that's on us.

I stumbled on the Anthropic Academy course during a late night of prompt debugging and realized I'd been spending 90% of my energy on Description and maybe 10% on Discernment. The 4D framework was the first thing that made me step back and ask whether I was even delegating the right tasks to AI in the first place. I suspect many of us are in the same boat. It reminds us that prompting is one piece of a larger fluency puzzle.

And about whether this is even worth learning: someone on a Salesforce community recently declared prompt engineering "obsolete." I understand the impulse. The term does carry baggage from early 2023, when it meant stringing together clever jailbreak phrases. But what's actually happened is that prompting evolved from writing clever sentences to specifying intent with precision. That's not obsolete. That's matured. The RCTCO framework below is evidence of that maturation, not a relic of it.

The rest of this article focuses on Description, specifically through the RCTCO framework. But everything here assumes we've already done the Delegation work (deciding this task is right for AI) and will follow through with Discernment and Diligence on the outputs.

📋 The RCTCO Framework (Adapted for L&D)

After months of testing across all three major platforms, here's the framework I keep coming back to. I'm calling it RCTCO, and honestly, I'm still refining it. But it works better than anything else I've tried.

R = Role (Who should the AI be?)
Not just "you are a helpful assistant." How often do we start a prompt without telling the AI what perspective to take? Think: "You are a senior instructional designer with 10 years of experience in corporate L&D, specializing in compliance training for financial services."

C = Context (What's the learning situation?)
Learner audience, their experience level, the business problem driving the training, the delivery format, and the LMS constraints. How many of us skip this because it feels obvious? This is where most L&D prompts fail. We skip the context because it feels obvious to us. But it's not obvious to the model. Would we hand a new contractor a project without a brief? Then why do we hand AI one?

T = Task (What specific deliverable do we need?)
Not "write some content." Instead: "Draft 5 terminal learning objectives at the Application level of Bloom's revised taxonomy." How specific can we get? More specific than we think.

C = Constraints (What are the guardrails?)
Bloom's level, word count, reading level, regulatory requirements, brand voice, and Kirkpatrick level we're measuring against. This is the secret sauce for L&D work. What separates a useful output from a generic one? Constraints. Every time.

O = Output Format (What should the deliverable look like?)
Table, bullet list, scenario script, facilitator guide section, assessment item with answer key, and rationale. How many revision rounds could we save by being explicit about the format up front? In my experience, at least 2-3.

Why does this matter? In February 2026, the Josh Bersin Company research found that companies using AI-first learning approaches are 28 times more likely to unlock employee potential. But fewer than 5% have deployed AI-native technology. The gap between potential and practice is enormous, and better prompting is the bridge most of us can cross right now.

🔄 Why Our Prompts Keep Breaking (The Model Refresh Problem)

Here's something that doesn't get talked about enough in L&D circles. Prompting is not a one-and-done skill. Every model refresh changes how our prompts need to be structured. That carefully crafted prompt library the team built in October 2025? It's probably underperforming right now, and we might not even realize it.

I learned this the hard way. Prompts that worked beautifully on one version of Claude produced noticeably different outputs on the next. Not worse, necessarily, but different in ways that broke assumptions baked into our workflows. And this isn't just my experience. I keep hearing variations of the same story in L&D communities: "My prompts worked great in GPT-3.5, then broke in GPT-4, and now produce something completely different in GPT-5." Or the related frustration: "I built a whole prompt library for my team, and half of it stopped working after the last model update."

Why does the same prompt give completely different results on different days? Sometimes it's the model provider rolling out a minor update. Sometimes it's the temperature and sampling randomness. But increasingly, it's because the model itself has been updated in ways that change how it interprets instructions. Let me share what I've tracked across the three major model families.

Claude (Anthropic) Evolution

Opus 4 needed verbose instructions and repeated emphasis. We had to say things twice or three times to ensure the model weighted them properly. Constraints buried in the middle of a long prompt? Often ignored.

Sonnet 4.5 started inferring intent over literal compliance and needed fewer examples to understand what we wanted. It also introduced parallel tool use, which changed how agentic workflows operated. Prompts that over-specified were actually less effective because the model spent tokens reconciling unnecessary detail.

Opus 4.6 brought a breaking change: assistant message prefill was removed. Any workflow that relied on starting the model's response with a specific format string just stopped working. Instruction persistence improved dramatically (from roughly 89% to 97% compliance in testing). Adaptive thinking was added. System prompts now replace prefill as the control mechanism. When Opus 4.6 dropped and broke prefill, I spent an entire Saturday morning rewriting the assessment prompts I'd built for my team. That was the weekend I realized prompt maintenance is a real operational cost. Did anyone else scramble to rewrite prompts when this landed?

ChatGPT/OpenAI Evolution

GPT-4.1 rewarded artful phrasing and elaborate role-play prompts. The more creative and detailed the persona setup, the better the outputs tended to be.

GPT-5 introduced what I've been calling "The Specification Shift." GPT-4 rewarded style; GPT-5 rewards specification. It added a verbosity parameter and control over reasoning effort. The first time I ran a standard storyboard prompt on GPT-5 after months on GPT-4, the output was half the length and twice as focused. I thought something was broken. It wasn't. The model just stopped filling in my gaps. Here's the critical change: vague instructions now actively hurt more than before. Why? Because the model burns reasoning tokens trying to reconcile contradictions and fill gaps in underspecified prompts. The same ambiguity that GPT-4 handled gracefully now costs GPT-5 extra reasoning tokens to reconcile what we probably meant. That's not a bug. It's the models telling us what they need.

GPT-5.1 and 5.2 pushed further toward deliberate scaffolding, stronger scope discipline, and lower verbosity by default. Prompts that assumed the model would be expansive started producing clipped outputs.

GPT-5.3 reduced hallucinations by 22% and fixed persistent tone issues. A real improvement, but it also meant prompts calibrated to work around earlier hallucination tendencies were now over-constrained.

Gemini (Google) Evolution

Gemini 2 and 2.5 responded well to standard prompt engineering. Temperature tuning worked predictably. Outputs tended toward verbosity.

Gemini 3 introduced a dramatic reduction in verbosity (2.5 used roughly 4x more tokens for equivalent tasks). This broke temperature tuning strategies that many of us relied on (the recommendation now is to keep the temperature at the default 1.0). Thinking levels were added as a new control mechanism. On my team, we had someone who'd built her entire workflow around Gemini 2.5's verbose outputs. When Gemini 3 cut the verbosity by 4x, her process fell apart. That's when I realized we'd built dependencies on model behavior rather than on our framework.

Gemini 3.1 improved reasoning but showed some regression in instruction-following compared to 2.5 Pro. Prompts that were tightly followed on 2.5 Pro occasionally drift on 3.1. Are we testing for this? Most L&D teams aren't.

What This Means for Our Work

The era of "prompt hacks" is ending. All three model families are converging on the same principles: specification over style, explicit verbosity controls, and agent-oriented prompting patterns. The number one skill is now clarity, not cleverness.

So, is there a way to write prompts that work across different models, or do we need separate prompts for each? In my experience, the RCTCO structure transfers well across all three. A prompt built on clear Role, Context, Task, Constraints, and Output Format will produce usable results in Claude, GPT, and Gemini. But the fine-tuning layer (where we squeeze out the best results) is model-specific. That's why the templates below include both a universal structure and model-specific adjustments.

But the tool-specific adjustments on each template below? Those have a shelf life. I update mine roughly every 6-8 weeks, or whenever a major model refresh drops. How often are we auditing our prompt libraries? If the answer is "we built them once," that's a problem worth addressing.

This is also why prompting belongs in our L&D curricula as a living skill rather than a one-time workshop module. The principles transfer. The implementation details change constantly.

📝 Template 1: Writing Bloom's Aligned Learning Objectives

Here's a prompt I use regularly. What happens if we give AI the same level of instructional specificity we'd put in a design document? Feel free to copy it, modify it, break it. That's how we learn what works.

ROLE: You are a senior instructional designer certified in curriculum
development, with expertise in Bloom's revised taxonomy and Mager-style
performance objectives.

CONTEXT: We're developing a [FORMAT: e.g., 45-minute eLearning module]
for [AUDIENCE: e.g., newly promoted team leads in a SaaS company] who need
to [BUSINESS NEED: e.g., conduct effective 1:1 performance conversations].
Current skill level: [LEVEL: e.g., they can schedule 1:1s but avoid
difficult feedback]. This training maps to [KIRKPATRICK LEVEL: e.g.,
Level 3 - Application on the job].

TASK: Write 5 terminal learning objectives. Each must:
- Use a single, observable, measurable action verb from Bloom's revised
  taxonomy at the [TARGET LEVEL: e.g., Application] level or above
- Follow Mager's ABCD format (Audience, Behavior, Condition, Degree)
- Be achievable within the module timeframe

CONSTRAINTS:
- No compound objectives (one verb per objective)
- Avoid "understand," "know," or "learn" as primary verbs
- Each objective must be assessable through [METHOD: e.g., role-play
  observation or scenario-based quiz]
- Reading level: [LEVEL: e.g., Grade 10 Flesch-Kincaid]

OUTPUT FORMAT: Numbered list. For each objective, include:
1. The objective statement
2. Bloom's level in parentheses
3. Suggested assessment method in brackets

Tool-specific adjustments (current as of March 2026):

Claude Opus 4.6: Works beautifully as-is. Claude's adaptive thinking aligns well with Bloom's. Add "Think through each objective step by step before writing it" for even better results. Note: if migrating from older Claude prompts, remove any assistant prefill strings.
Gemini 3 Pro: Prepend "Search your knowledge for Bloom's revised taxonomy action verbs at the [level] before generating." Gemini's research integration pulls from current taxonomies. Keep the temperature at the default 1.0.
Microsoft 365 Copilot: If working in Word, use a simpler version and paste the Bloom's verb list directly into the prompt. Copilot works best with shorter, more directive prompts and concrete examples.

🎭 Template 2: Generating Scenario-Based Assessment Questions

This one took me a while to get right. I kept getting scenarios that felt like textbook examples rather than something our learners would actually encounter. Has anyone else hit that wall?

And here's a question I think we all need to sit with: how do we trust AI-generated assessment questions when we know these models hallucinate? SAGE Journals' 2025 research gives us a partial answer. AI-generated questions align well with lower-order Bloom's levels but struggle at higher-order thinking. So the trust question isn't binary. It's level-dependent. For Recall and Understand? The quality is often solid. For Analyze and Evaluate? Human review isn't optional, it's essential. The template below builds in that awareness through explicit targeting of Bloom's levels and a mandatory rationale for each answer choice.

ROLE: You are an assessment design specialist who creates realistic
workplace scenarios for corporate training evaluation.

CONTEXT: We're assessing learners who completed training on
[TOPIC: e.g., data privacy handling for customer service reps] at a
[COMPANY TYPE: e.g., mid-size healthcare SaaS company]. These are
[AUDIENCE: e.g., frontline support agents, 1-3 years experience] who
handle [VOLUME: e.g., 40+ customer interactions daily].

TASK: Create 4 scenario-based multiple-choice questions that test at
Bloom's [LEVEL: e.g., Application and Analysis] levels.

CONSTRAINTS:
- Each scenario must reflect a realistic workplace situation, not a
  textbook example
- Include plausible distractors that represent common mistakes learners
  actually make (not obviously wrong answers)
- One question should involve a gray area where policy interpretation
  matters
- Align to [REGULATION/STANDARD: e.g., HIPAA privacy requirements]
- Each question stem: 2-4 sentences max

OUTPUT FORMAT: For each question, provide:
1. Scenario stem
2. Four answer choices (A-D)
3. Correct answer with rationale (2-3 sentences)
4. Why each distractor is wrong (1 sentence each)
5. Bloom's level tag

Tool-specific adjustments (current as of March 2026):

Claude Opus 4.6: Add "Make the distractors represent genuine misconceptions, not strawmen. Each wrong answer should be something a reasonable but undertrained person might actually choose." Opus 4.6's improved instruction persistence (97% compliance) means this constraint will actually stick.
Gemini 3 Pro: Add "Reference current [industry] regulations from 2025-2026 when creating scenarios." Gemini pulls in a fresher regulatory context. Expect shorter outputs than Gemini 2.5 would have produced.
Microsoft 365 Copilot: Best used after pasting the training content into the Word document first, then asking Copilot to "generate assessment questions based on the content above." Copilot excels when the source material is in the same document.

🏗️ Template 3: Creating Scenario-Based Learning Content from SME Notes

This is where I've seen the biggest time savings. What if we stopped handing SMEs a blank page and started handing them a structured draft to react to? When I led my team, we reduced our SME review process from 6 weeks to 2 weeks. But here's the catch: getting good drafts from AI requires good prompts. How do we bridge the gap between messy SME notes and clean scenario content?

A question I hear a lot: "AI generates content fast, but my SMEs still have to review everything. Am I actually saving time, or just shifting the bottleneck?" In my experience, the answer depends entirely on the quality of the first draft. When AI output is generic and robotic (and it absolutely can be, especially without the Constraints layer), SMEs spend just as long rewriting as they would from scratch. But when the prompt includes the right role, context, and constraints, the AI draft becomes a reaction document instead of a rewrite target. Our SMEs went from creating content to validating it. That was a fundamentally different (and faster) cognitive task.

ROLE: You are an instructional designer who specializes in transforming
technical subject matter expert input into engaging, learner-centered
scenario content.

CONTEXT: Below are raw notes from a SME interview about
[TOPIC: e.g., troubleshooting network connectivity issues]. The target
audience is [AUDIENCE: e.g., Level 1 help desk technicians in their first
90 days]. They'll consume this as [FORMAT: e.g., a branching scenario in
our LMS (Skilljar)]. The business goal: [GOAL: e.g., reduce escalation
rate from 45% to 30% within 6 months].

SME NOTES:
[Paste raw interview notes, bullet points, or transcript excerpts here]

TASK: Transform these SME notes into a branching scenario with 3 decision
points. Each branch should lead to either a successful resolution or a
realistic consequence that teaches through failure.

CONSTRAINTS:
- Use conversational, second-person language for the scenario narrative
- Technical accuracy must match the SME notes exactly (flag anything
  that seems contradictory or incomplete)
- Include 1 "common mistake" path that addresses the most frequent
  real-world error
- Reading level: [LEVEL: e.g., Grade 8-10]
- Scenario length: [LENGTH: e.g., 800-1,200 words total across all
  branches]

OUTPUT FORMAT:
- Opening situation (3-4 sentences)
- Decision point 1 with 3 choices → consequences
- Decision point 2 with 2-3 choices → consequences
- Decision point 3 with 2-3 choices → consequences
- Debrief summary connecting decisions to learning objectives
- [FLAG] section noting any gaps or contradictions in SME notes

Tool-specific adjustments (current as of March 2026):

Claude Opus 4.6: Claude handles the longest SME transcripts well with its large context window. Add "Before writing, identify the 3 most critical decision points in these notes and explain why you chose them." System prompt placement is now more important than ever for maintaining constraints across long outputs.
Gemini 3 Pro: With its 1M-token context, Gemini also handles long transcripts. Add "Cross-reference these SME notes against current best practices for [topic] and flag any outdated information." Note that Gemini 3's outputs will be significantly more concise than Gemini 2.5's.
Microsoft 365 Copilot: Best if the SME notes are already in a Word document. Use Copilot's "Transform" feature to restructure, then apply the scenario format as a second prompt. Two shorter prompts outperform a single long prompt in Copilot.

📊 Template 4: Building Evaluation Rubrics for Training Programs

I'll admit, this is the template I'm least confident about. How do we even quantify "good enough" for a rubric? Building evaluation rubrics is hard enough without AI, and I'm still iterating on what works. Has anyone found a reliable way to validate AI-generated rubrics against actual learner performance data? Here's where I've landed so far, but I welcome pushback.

ROLE: You are a learning measurement specialist with expertise in
Kirkpatrick's four levels of evaluation and competency-based assessment
design.

CONTEXT: We need to evaluate a [PROGRAM: e.g., 3-day new manager
development program] for [AUDIENCE: e.g., first-time people managers at a
200-person tech company]. The program covers [TOPICS: e.g., giving
feedback, setting expectations, coaching for performance]. Success
metrics defined by leadership: [METRICS: e.g., 90-day new manager
confidence scores, team engagement survey delta, time-to-productivity
for their direct reports].

TASK: Create a comprehensive evaluation rubric covering Kirkpatrick
Levels 1 through 3 (Reaction, Learning, Behavior).

CONSTRAINTS:
- Level 1: Include both engagement and relevance dimensions
- Level 2: Assessment items must align to the program's stated learning
  objectives (list them if available, or note they need to be provided)
- Level 3: Behavioral indicators must be observable by the learner's
  manager within 30-60 days post-training
- Use a 4-point scale (Developing, Competent, Proficient, Expert) with
  specific behavioral descriptors at each level
- Avoid subjective language like "good" or "effective" without defining
  what that looks like

OUTPUT FORMAT: Table format with columns:
| Kirkpatrick Level | Competency Area | Developing (1) | Competent (2) |
Proficient (3) | Expert (4) | Assessment Method | Timeline |

Tool-specific adjustments (current as of March 2026):

Claude Opus 4.6: Add "For Level 3 behavioral indicators, write them as specific observable actions, not attitudes. A manager should be able to check yes/no on each indicator."
Gemini 3 Pro: Add "Search for current research on new manager development program effectiveness metrics from 2024-2026 and incorporate evidence-based indicators."
Microsoft 365 Copilot: Generate the rubric in Excel using Copilot, which handles table formatting natively. Prompt: "Create an evaluation rubric table" and specify columns. Copilot's Excel integration makes rubrics immediately usable.

🛠️ Managing Prompts as a Team

One question that keeps surfacing in L&D communities: "How should my L&D team organize and share prompts so we're not all reinventing the wheel?" And the follow-up: "How often should we update our prompt library? Who owns it?"

These are operational questions, and I think we underestimate how much they matter. Here's what I've found works:

Shared library, single owner. One person on the team owns the prompt library and is responsible for version control. That doesn't mean they write every prompt. It means they maintain quality and consistency, test prompts against current model versions, and flag when something needs updating. Without ownership, prompt libraries decay fast.

Audit cadence: every 6-8 weeks, or after any major model update. Whichever comes first. When Claude Opus 4.6 dropped and broke assistant prefill, teams that caught it in a week lost minimal productivity. Teams that didn't notice for two months had an entire library of underperforming prompts.

Template structure over individual prompts. Instead of sharing 50 finished prompts, share 5-8 RCTCO templates with fill-in-the-blank sections (like the four in this article). Templates survive model changes better than fully baked prompts because the user adapts the specifics each time.

🔍 Choosing the Right Tool for the Task

"Should I use ChatGPT, Claude, or Gemini for instructional design? They all seem to do the same thing." I see this question everywhere, and I get it. On the surface, the outputs look similar. But under the surface, the differences matter for our work.

Dr. Philippa Hardman tested multiple models against instructional design tasks and found that each has distinct strengths. Her conclusion is worth quoting: "Until we have specialized AI copilots for instructional design, we should be cautious about relying on general-purpose models." That caution is important. None of these tools are purpose-built for L&D. We're adapting general-purpose models to specialized work. That's why the RCTCO framework matters: it bridges the gap between a general model and our specific needs.

Here's how I think about tool selection for L&D work in March 2026:

Claude Opus 4.6 writes the best long-form instructional content I've seen. Its instruction persistence (97% in my testing) means complex, multi-constraint prompts actually hold. Best for: learning objectives, scenario writing, rubric development, anything where nuance and constraint-following matter.

Gemini 3 Pro is unmatched for research synthesis, thanks to its Google ecosystem integration. When we need to pull in current regulatory frameworks or cross-reference best practices, Gemini's grounding in search gives it an edge. Best for: research-backed content, regulatory training, and content that needs current data.

Microsoft 365 Copilot is the obvious choice when the deliverable lives in PowerPoint, Word, or Excel. "Is Copilot actually useful for L&D work, or is it just for people already deep in Microsoft 365?" Honestly, if our L&D team's entire workflow lives in the Microsoft ecosystem (and many do), Copilot's integration advantage is significant. If we're working outside that ecosystem, the standalone models offer more flexibility. Best for: slide decks, document-native content, rubrics in Excel, and anything that stays in the M365 workflow.

Why are we trying to force one tool to do everything? Match the tool to the deliverable.

🔍 Common L&D Prompting Mistakes (And How We Fix Them)

After testing thousands of prompts across these platforms, here are the patterns I keep seeing. And honestly, I've made every single one of these mistakes myself.

Mistake 1: Skipping the audience context.
We assume the AI knows our learners. It doesn't. "Write a module on change management" produces something wildly different from "Write a module on change management for warehouse supervisors at a logistics company who are resistant to a new WMS implementation." Is the extra 30 seconds of context worth it? Every single time.

Mistake 2: Not specifying the Bloom's level.
This is the one that costs us the most revision time. Without a specified cognitive level, AI defaults to "Understand" at best. If we need Application-level objectives, we have to say so explicitly. Research confirms AI struggles with higher-order thinking unless we provide very clear scaffolding.

Mistake 3: Accepting the first output.
Dr. Philippa Hardman recommends a three-step QA process: accuracy check, pedagogical alignment, and learner experience review. I'd add a fourth: SME validation. The 2025 Bloom's taxonomy research found that human-generated questions still provide greater cognitive alignment than AI-generated ones at higher levels. AI gets us 70-80% of the way there. We close the gap. This is the Discernment competency from the 4D framework in action.

Mistake 4: Using one tool for everything.
Each model has distinct strengths for L&D work (see the tool selection section above). Why force Claude to build a slide deck when Copilot does it natively? Why use Copilot for research synthesis when Gemini is grounded in search?

Mistake 5: Prompting for perfection instead of iteration.
The goal isn't a perfect first draft. It's a structured starting point that's faster to refine than a blank page. When we shifted our team's mindset from "AI should write this for me" to "AI should give me something to react to," revision time dropped significantly. And that persistent complaint, "AI-generated content sounds robotic and generic, I still spend hours editing everything it produces." That usually traces back to missing Constraints and Context in the prompt. The templates above exist precisely to solve this.

Mistake 6: Treating prompts as permanent artifacts.
This might be the most expensive mistake in the list. On my team, we built a prompt library in mid-2025 that worked well on Claude Opus 4 and GPT-4.1. By November 2025, when the new models dropped, at least a third of those prompts were underperforming. The models had changed, but our prompts hadn't. That experience is why I now recommend auditing every 6-8 weeks or after any major model release. Does anyone else have a prompt maintenance cadence? I'm curious what works.

😶 The Anxiety We're Not Talking About

Here's something I need to name directly, because it's weighing on a lot of us. I keep seeing posts like: "My organization wants me to 'use AI' but hasn't provided any training." Or: "I feel like I'm falling behind. Other IDs on LinkedIn seem to be using AI for everything." The World Economic Forum identifies what it calls an "AI perception gap," in which workers feel behind even when the adoption curve is still early. LinkedIn's own data show that 54% of long-form posts on the platform are now AI-assisted, creating an inflated perception of how far ahead everyone else is. We're comparing our real, messy learning process to other people's polished, AI-assisted outputs. That's not a fair comparison, and it's not a useful one.

And the bigger question underneath all of this: "What happens to instructional design as a profession if AI keeps getting better? Are we training ourselves out of a job?" I don't think so, but I understand the anxiety. The SAGE Journals research tells us AI struggles with exactly the things that make instructional designers valuable: higher-order cognitive alignment, pedagogical judgment, learner empathy, and organizational context. AI is getting better at generating content. It's not getting better at knowing which content a specific learner population needs, in what sequence, and how it's measured against which business outcomes. That's our work. And frameworks like RCTCO are how we direct AI to support that work rather than replace it.

🧠 What This All Means

The February 2026 Josh Bersin research paints a stark picture: a $400 billion corporate learning market where 74% of companies can't keep pace with skills demand, yet fewer than 5% have deployed AI-native technology. We're sitting in that gap right now.

The answer isn't more tools. Do we really need another platform? McKinsey's 2025 data shows enterprise AI adoption at 72% globally. We have the tools. What we lack is the domain-specific knowledge to use them effectively in instructional design. So why is so much of the conversation still about which tool to buy rather than how to use the ones we have? Josh Cavalier's ATD AI certification program, Dr. Philippa Hardman's ADGIE model, Connie Malamed's practical workflow research at The eLearning Coach, Anthropic's free 4D AI Fluency course on Anthropic Academy: these voices are building the bridge between general AI capability and L&D-specific application.

The cross-model trend is clear. All three major model families are converging on the same expectations: specification over style, explicit constraints over clever phrasing, structured input over creative prompting. The models are getting better at understanding what we mean. But "better at understanding" also means "less tolerant of ambiguity." A vague prompt that GPT-4.1 handled gracefully now costs GPT-5 extra reasoning tokens, reconciling what we probably meant. That's not a bug. It's the models telling us what they need.

There's a design thinking principle here too: empathize first, define the real problem, then prototype. The RCTCO framework is really a design thinking exercise disguised as a prompt structure. We empathize with the learner (Context), define what success looks like (Task + Constraints), and prototype rapidly (Output Format). The iteration isn't a flaw in the process. It is the process.

But here's the uncomfortable truth I keep coming back to. Better prompts don't fix bad instructional design. If we don't know what Bloom's level we're targeting, can any framework save us? If we can't articulate the business problem the training solves, what happens? AI just generates prettier versions of content that doesn't work. The RCTCO framework is a tool. Our expertise is what makes it useful. And the 4D framework reminds us that Description (prompting) is only one of four competencies we need. Delegation, Discernment, and Diligence matter just as much.

The RCTCO framework and the prompt templates in this article are pieces of a larger L&D AI Operating System I've been building, one that connects prompting to delegation, workflow design, and measurement. I'll be going deeper into that system soon.

Could I be wrong about some of this? Absolutely. The field is moving fast enough that what works in March 2026 might look different by June. The model-specific notes in this article will drift. But the structural principles (clear roles, explicit context, specific constraints) will hold, because they're about communicating intent clearly, not exploiting model quirks.

🎯 The One Thing to Do This Week

Take one prompt we use regularly for instructional design work. Run it through the RCTCO framework. Add the Role, Context, Task, Constraints, and Output Format layers. Then test it on the current version of whatever model we're using, not the version we built it for. Compare the output to what we were getting before. I'd bet the difference is significant enough to justify the extra 2-3 minutes of prompt setup. And if it's not, I want to hear about it. That's how we all get better at this.

If the Anthropic 4D framework is new to us, the free course on Anthropic Academy (12 lessons, developed with Prof. Rick Dakan and Prof. Joseph Feller, CC BY-NC-SA license) is worth the time investment. It reframes AI fluency beyond just prompting, and that broader view matters.

What prompting patterns are working for our L&D teams right now? Which model updates broke our workflows? What are we all missing? Share what's working.

-- Eian

Learning, Upgraded is a newsletter for L&D professionals navigating the intersection of technology and learning. Read more at learningupgraded.com.

Sources

Anthropic. (2025). AI fluency for everyone. Anthropic Academy. academy.anthropic.com
Anthropic. (2025). Claude Opus 4.6 release notes. docs.anthropic.com
Bersin, J. (2026, February). The rise of the superworker. The Josh Bersin Company. joshbersin.com
Cavalier, J. (2025). Applying AI in learning and development: From platforms to performance. ATD Press.
Hardman, P. (2025). AI for instructional design: Current capabilities and limitations. philippahardman.com
Malamed, C. (2024). AI-assisted instructional design workflows. The eLearning Coach. theelearningcoach.com
McKinsey & Company. (2025). The state of AI in 2025. mckinsey.com
OpenAI. (2025). GPT-5 system card. openai.com
SAGE Journals. (2025). Alignment of AI-generated questions with Bloom's taxonomy levels. journals.sagepub.com
World Economic Forum. (2025). Future of Jobs Report 2025. weforum.org