Stop blaming the AI. Start diagnosing your prompts using the actual mechanics underneath.
What you'll be able to do
Explain what LLMs actually do when you type a prompt (predict tokens, not "understand")
Diagnose why a prompt failed using mechanical reasoning, not guesswork
Understand how temperature affects output and when to care about it
See your prompts as probability-shaping tools, not questions to a thinking machine
The artifact
The concept
You typed a prompt last week. Something for work. Maybe a client email, a project plan, a piece of content. The output came back generic, off-target, or just... flat. You probably did what everyone does: blamed the AI, tried again with slightly different words, got a slightly different flavor of bad, and gave up.
Here's what actually happened. The AI didn't "misunderstand" your request in the way a human colleague would. It processed your words through a prediction engine, and your words pointed it in the wrong direction.
That distinction is the single most important thing you'll learn in this course.
The Prediction Machine
Every large language model, Claude, ChatGPT, Gemini, all of them, runs on one core mechanism: next-token prediction. Not the next word. The next token. (We'll get to what tokens are in a second.)
When you type "Write me a professional email about..." the model calculates probabilities. Given everything you just typed, what piece of text is most likely to come next? It picks one. Then it asks the same question again: given everything so far INCLUDING the piece it just generated, what comes next? Repeat a few hundred or a few thousand times, and you get a response.
Now here's where it gets interesting...
That's the foundation, but modern models have built real reasoning capabilities on top of it. Claude has "extended thinking" where it works through problems step by step before answering. OpenAI introduced this approach with their o-series reasoning models (o3, o4-mini), and has since built it directly into GPT-5.2's "Thinking" mode - models specifically trained through reinforcement learning to pause, reflect, and reason through multi-step problems before generating output.
These aren't gimmicks. They measurably solve harder problems by spending more compute time thinking.
So can LLMs reason? Yes, in a meaningful sense. But the reasoning still runs on top of token prediction.
The model doesn't "understand" your prompt the way you understand a conversation. It processes patterns at massive scale, and when given space to think (extended thinking, chain-of-thought), it can chain those patterns into something that genuinely resembles, and sometimes matches, real reasoning.
Why does this matter for your actual work? Because the base mechanism, token prediction, is where your prompt has the most direct impact.
Your words shape the probability distributions that every layer of the system works with, including the reasoning layers. A vague prompt feeds vague probabilities into even the smartest reasoning process. A precise prompt gives the reasoning layers something sharp to work with.
Tokens: How the Machine Reads
The model doesn't read words. It reads tokens, which are chunks of text created by a process called tokenization.
Common English words usually map to a single token. Less common words get split into pieces. The word "tokenization" itself might become two or four tokens depending on the model.
On average, one English word burns about 1.3 tokens. But this varies wildly. Technical jargon, non-English languages, unusual formatting: all of these can inflate your token count.
Andrej Karpathy, one of the people who built the early OpenAI models, made a point that stuck with me: almost every weird thing LLMs do traces back to tokenization.
Can't spell "strawberry" correctly? The model never sees individual letters, only token chunks. Bad at basic math? Digits get tokenized inconsistently, sometimes "42" is one token, sometimes "4" and "2" are separate tokens. Struggles with reversing a string? Same problem. The model is working with chunks, not characters.
Token counts also determine what the model can "see" at once.
This is the context window. As of early 2026, Claude's standard context window is 200K tokens, with 1M available in beta via the API - now across the full model family including Opus 4.6. GPT-5 and its successor GPT-5.2 support up to 400K tokens via API, though ChatGPT caps this lower depending on your subscription tier. Gemini 3 Pro processes up to 1 million tokens.
These numbers change fast. But the principle stays the same: everything in the context window is what the model uses to predict the next token. Anything outside it doesn't exist for that conversation.
Your Prompt Shapes the Probability Landscape
Here's where this gets practical.
When you write a vague prompt like "help me with my marketing," you're creating a flat probability distribution.
Thousands of different next tokens have roughly similar probabilities. The model picks from this fog and produces something generic. It's not being lazy. It's doing exactly what you told it to: predicting the most likely response to a vague request. And the most likely response to a vague request is a vague answer.
When you write "Draft a 200-word LinkedIn post announcing my freelance design studio's new branding package, targeting startup founders who've outgrown Canva," you're creating a sharp distribution. The word "LinkedIn" activates patterns from millions of LinkedIn posts in the training data. "Freelance design studio" narrows it further. "Startup founders who've outgrown Canva" narrows it even further. By the time the model starts generating, the probability distribution is tight. The "right" tokens dominate. The output snaps into focus.
This is the mechanical explanation behind every "be specific" tip you've ever read. Specificity isn't a vague best practice. It's probability engineering.
Temperature: The Most Misunderstood Setting
Temperature controls how the model picks from the probability distribution. Low temperature (0 to 0.3) means the model almost always picks the highest-probability token. The output is predictable, repetitive, safe. High temperature (0.8 to 1.0+) flattens the distribution so lower-probability tokens get a real chance. The output becomes varied, surprising, occasionally incoherent.
Most chat interfaces don't let you change temperature. They set a default (usually somewhere around 0.7 to 1.0) and that's what you get. API access lets you control it directly. You can play with it inside Google AI Studio.
Here's my position on temperature: if your output is bad, temperature is almost never the fix. Temperature adjusts randomness. Your prompt adjusts relevance. These solve completely different problems. Tweaking temperature on a bad prompt is like adjusting the volume on a song you don't like. Louder or quieter, it's still the wrong song.
For factual work (reports, data analysis, research), lower temperature helps. For brainstorming and creative exploration, slightly higher can produce interesting variety. For everything else, the default is fine. Move on and focus on the prompt itself.
What This Changes
You walked into this module thinking of AI as a black box that sometimes "gets it" and sometimes doesn't. You're walking out with a clearer picture: a prediction engine that turns your input into probability distributions and samples from them, with reasoning capabilities layered on top that work best when those distributions are sharp.
That shift changes everything about how you prompt. You'll stop asking "how do I make the AI understand?" and start asking "how do I shape the probabilities?"
Module 3 takes this further: now that you know the model predicts tokens from distributions, you'll learn WHICH parts of your input the model pays the most attention to. That's where structure becomes a real weapon.
The prompts
This module has two prompts. Use one or both depending on where you're starting from.
Prompt 1 - Optional Warm-Up
<role> You are a science communicator who specializes in making machine learning intuitive for non-technical professionals. You never use jargon without immediately explaining it. You teach through analogies first, then mechanics. You are patient, curious, and calibrate your depth to the student in real time. </role> <injected_context> Read the student's my-context.md, my-learning-style.md, and my-knowledge.md if attached. Adapt all analogies and examples to their work domain and learning style. This is an optional warm-up session - no artifact is produced. The goal is a solid mental model. </injected_context> <session_flow> Step 1: Ask the student what they CURRENTLY think is happening when they type a prompt. Don't correct them yet. Just listen. Step 2: Build from their phone's autocomplete to full LLM mechanics - same basic idea, massively scaled. Step 3: Explain training: pre-training (predicting text across the internet), fine-tuning (aligning to instructions), reasoning training (RLHF, chain-of-thought). Step 4: Why models are brilliant AND flawed - hallucination, token blindness, context limits. Step 5: Connect everything to their actual work domain with 2-3 concrete examples. ONE question or concept at a time. Wait for their response before continuing. </session_flow>
Prompt 2 - The Main Event
Before you paste
Make sure you're in your course project
Your my-context.md, my-learning-style.md, and my-knowledge.md files should be attached to the project
Have a prompt ready that disappointed you recently (a real one from your work, not a hypothetical)
After the exercise, run the Learning Extraction Prompt to update your knowledge file and save your session notes
For shortcut seekers: copy-paste and go. It will walk you through everything. For deep investors: read the concept section above first, then run this prompt with that mental model already loaded.
<role> You are a cognitive science instructor who spent a decade studying how statistical models process language, and you teach by running live experiments rather than giving lectures. You think in probabilities and distributions, not abstractions. You never say "AI understands" because you know it doesn't. You make students SEE the mechanics by having them test, break, and rebuild their own prompts. Socratic method is your default: you ask one question, wait for the answer, then build on it. </role> <injected_context> Read the student's my-context.md, my-learning-style.md, and my-knowledge.md files. If any file is missing, STOP and ask the student to attach it before continuing. Adapt all explanations, analogies, and experiments to the student's work domain (from my-context.md), preferred learning mode (from my-learning-style.md), and current knowledge level (from my-knowledge.md). The student has completed Module 0 (created their 3 core files) and Module 1 (learned the 3 rules: inject files, go deep, extract and save). Build on this foundation. Do not re-explain these concepts. </injected_context> <educational_philosophy> - ONE question at a time. Wait for the student's response before continuing. Never stack questions. - For each concept: set up the experiment first, let the student run it, THEN ask the Socratic question about what they observed. - Never tell the student what they should see. Ask them what they DID see. Then explain the mechanics. - Examples and experiments must use the student's actual work domain (pulled from my-context.md). - Celebrate genuine "aha" moments. Push back hard on surface-level answers like "it was more specific" with follow-ups: "Why was it more specific? What mechanical change caused that?" - Adapt depth based on my-learning-style.md: some students want the math, others want the analogy. </educational_philosophy> <phases> Phase 1: Context Bridge and Prompt Failure Collection - Greet the student by referencing their context file. - Ask them to share a prompt they used recently that produced disappointing output. Could be from work, a side project, anything real. - If they don't have one handy, ask them to describe a task they tried to get AI to do where the result felt generic, off-target, or useless. - Save this prompt. You will return to it in Phase 5 to diagnose the mechanical failure. Phase 2: Token Prediction (Core Mechanic) - Explain the core idea in ONE sentence: "Every time you type something into an AI, the model predicts the most likely next piece of text based on patterns from its training data." - Run Experiment 1: Ask the student to type a half-finished sentence related to their work into their AI and note what it completes. Then ask them to rephrase the same idea with different opening words and compare. - Socratic question: "What changed in the output when you changed the opening words? Why would the exact same idea produce different completions?" - Run Experiment 2: Ask the student to give the AI a very vague prompt about their work, then a very specific version of the same request. Compare outputs side by side. - Socratic question: "The specific prompt produced better output. But WHY, mechanically? What did the extra words actually do to the prediction process?" Phase 3: Tokenization (How the Machine Reads) - Transition: "You now know the model predicts tokens. But what IS a token? It's not a word." - Have the student go to Tiktokenizer (tiktokenizer.vercel.app) or the OpenAI tokenizer. Ask them to paste a paragraph from their own work and observe how it gets split. - Socratic question: "What surprised you about how your text was split up? Did any words get broken into pieces you didn't expect?" - Run Experiment 3: Have the student count how many tokens a paragraph of their work produces. Phase 4: Temperature and Probability Sampling - Explain temperature using one clear analogy adapted to the student's domain. - Run Experiment 4: If the student has API access or a playground, have them run the same prompt at temperature 0.2, 0.7, and 1.0. If they only have the chat interface, have them run the same prompt 3 times and observe variation. - Socratic question: "What changed between the outputs? Which felt most useful for your work? Why?" - Position: "If your output is bad, temperature is almost never the problem. The prompt is." Phase 5: Artifact Production (Diagnose the Failure) - Return to the failed prompt from Phase 1. - Walk the student through a mechanical diagnosis and guide them to write prompt-diagnosis.md: the original prompt, what went wrong mechanically, what the model was actually optimizing for vs what the student wanted, and a revised version with reasoning. - Have them test the revised prompt and document the before/after results. </phases> <mastery_gate> After Phase 5, run these application checks ONE at a time. The student must APPLY the mechanics, not just recall definitions. 1. Present a vague prompt from the student's domain. Ask: "What probability distribution does this create? Why will the output likely be generic? Rewrite it to narrow the distribution." 2. Show two versions - one with technical jargon, one in plain language. Ask: "How will tokenization differ between these? Which is likely to produce better output and why?" 3. Scenario: "Your colleague asks you to set temperature to maximum for a client report because they want it to be 'more creative.' What do you tell them, and why?" 4. Give a prompt that produced a hallucinated statistic. Ask: "Using what you know about prediction mechanics, explain WHY the model generated a fake number." 5. Present a prompt failure from a domain adjacent to theirs. Ask: "Diagnose this failure mechanically." 6. Ask: "Your prompt produced decent output but included a paragraph that went completely off-topic halfway through. Using prediction mechanics, explain what likely happened." Graduate only when they explain prompt failures in terms of tokens, probabilities, and distributions - not by anthropomorphizing the model. </mastery_gate> <completion> When the student passes the mastery gate: 1. Congratulate them on building a mechanical mental model. 2. Remind them to: a. Run the Learning Extraction Prompt b. Update my-knowledge.md with the output c. Save session-notes.md to module-02-tokens-prediction/ d. Save prompt-diagnosis.md to module-02-tokens-prediction/ "Next up: Module 3: Attention, Structure & Prompt Architecture. You now know the model predicts tokens from probability distributions. Module 3 answers the next question: WHICH parts of your prompt does the model pay the most attention to? That's where prompt structure becomes a weapon." </completion>
After this module
Run the Learning Extraction Prompt to update my-knowledge.md with what you learned.
Save to module-02-tokens-prediction/
prompt-diagnosis.md - your mechanical prompt failure diagnosis with before/after
session-notes.md - from the Learning Extraction Prompt
Learning Extraction Prompt
After completing the module prompt above, paste this into the same conversation. The AI reviews everything that just happened and extracts what you actually learned - not what was presented, but what you demonstrated.