MODULE_07 Domain

AI for Images
& Visual Content

~30-45 min active work 1 prompt 1 artifact + image series Prereq: Module 05

Learn the director's mindset for image generation: references over prompts, taste over vocabulary. Build a reusable image prompt template and produce a consistent visual series for your actual work.

01

Generate images that look like what you pictured, not random slop

02

Build and use a reference swipe file that does more for your output than any prompt trick

03

Maintain visual consistency across a series of images for a real work project

04

Know when to prompt, when to reference, and when to just open Photoshop

module-07-images/ ARTIFACT
image-prompt-template.md - your curated swipe file, Midjourney sref workflow, Nano Banana JSON template, consistency checklist, and example prompts
3-5 consistent images - your visual series for a real work project
+ session-notes.md from the Learning Extraction Prompt

The Biggest Lie in Image Generation Is That It's About Writing Good Prompts

It's not. The skill is taste. The skill is references. The prompt is just how you hand them to the model.

Think about how a film director works. They don't write the screenplay, then light the set, then operate the camera, then edit the footage. They walk into a room with a vision: mood boards torn from magazines, color palettes pulled from paintings, shot references from films they admire. They point at things and say "like that, but darker." The crew executes.

That's your new job.

The model is your crew. Your prompt is the brief you hand them. Most importantly, a brief without references is a coin flip. A brief WITH references is a direction. And the people producing the best images right now aren't writing the longest prompts or discovering magic keywords. They're collecting the best references. They're building what's called a swipe file: a personal stash of images, screenshots, designs, photos, anything with a visual quality they want to replicate.

A good swipe file has 10x more impact on your output than any prompt technique you'll ever learn.

This is a game of curation. Not vocabulary.

Different machine, different mechanics.

You learned in Module 2 that text models predict the next token in a sequence. Image models do something completely different. Most of them use diffusion, which means they start with pure noise and subtract it step by step until a picture forms out of the static. Some newer ones generate images more like text models do, token by token.

You don't need the math. You need the implication: image models don't "understand" your words the way Claude or ChatGPT understands them. They translate words into visual patterns they've absorbed from training. That's why showing a model an actual image communicates more than any paragraph of description you'll ever write.

In Module 6, you built a writer's profile to narrow the probability distribution for text. To constrain the model's output so it sounded like you instead of sounding like everybody. References and style locks do the exact same thing for images. They shrink the space of possible outputs so you get consistency instead of randomness.

Two models worth going deep on.

The image model landscape shifts constantly. For what's currently best at any specific task, check the landscape map and the most recent weekly guides in the community. Those stay current. This module won't try to.

But two models are worth learning deeply right now because they represent two fundamentally different ways of working:

Midjourney is the best for artistic and design output. When you need something that looks expensive, editorial, cinematic: Midjourney. Its killer feature is --sref (style reference). You feed it an image, it absorbs the visual DNA of that image, and applies it to whatever you're generating. Color palette, lighting, texture, mood. All of it. This is how you get consistency across a series. This is how you develop a "look." Midjourney doesn't need you to describe a style in words. It needs you to show it one.

Nano Banana (Google Gemini's image model) is the best overall image model right now. Strongest all-rounder for photorealism, text rendering, editing, and world knowledge. You can prompt "1969 Woodstock" and it gets the era right without you specifying bell-bottoms and mud. Its power move: JSON-structured prompting that gives you granular control over every element. Subject, setting, lighting, camera, palette, mood, text, negatives. All as discrete fields. While Midjourney speaks in vibes, Nano Banana speaks in specifications.

Both approaches work. You'll learn both. Then you'll know which one fits the job.

Six rules that separate operators from tourists.

1. Go to the source. The best output from any model comes from its native platform. Third-party providers often compress quality, downsample resolution, or run older model versions behind the scenes. Use Midjourney at midjourney.com. Use Nano Banana through Google AI Studio.

2. Generate in batches. Never one image at a time (if you can, use the API to do that). You're a director reviewing takes, picking the best from a contact sheet. Not a photographer with one frame of film left, praying.

3. Don't start at max resolution. This one saves real money. Generate at standard res. Iterate fast. Get something you actually like. THEN upscale to whatever resolution you need for the final. Every generation you throw away at 4K is credits you burned for nothing.

4. Temperature matters here too. Lower temperature gives you more predictable output, closer to what your references look like. Higher temperature gives you wilder variation, more surprise. Explore at high temp. Ship finals at low.

5. Text on images is its own fight. Big, bold typography renders sharp almost every time. If you want a specific font, name it in the prompt. Otherwise you get whatever generic the model defaults to. But if the text is small (product packaging, fine print, body copy), the model will probably butcher it. Try asking it to "render all content into ultra sharp and readable text." If it fails after two attempts, stop. Open Canva. Add the text yourself... Know when to stop fighting the machine.

6. Character consistency is a workflow, not a magic prompt. Don't try to generate the full complex scene at once with the character in it. Generate or source the character separately first. Just them. Clear background. Good detail. Then use THAT image as a reference for every scene they appear in. If you need to change their outfit, then generate the outfit on its own. Then reference both the character and the new outfit into the scene. Piece by piece. Not all at once. This is how people produce mascots, brand characters, and visual series that actually hold together across eight, twelve, twenty images.

Image Director

The prompt turns the AI into a creative director and visual strategist. Not someone who lectures you about diffusion models. Someone who sits next to you, looks at images with you, and helps you figure out what you like, then teaches you how to get that from the tools.

Before you paste

Make sure you're in your course project

Your my-context.md, my-learning-style.md, and my-knowledge.md files should be attached to the project

After the exercise, run the Learning Extraction Prompt to update your knowledge file and save your session notes

What happens in each phase

image-director.xml - 5 phases + mastery gate

PHASE 1 Context Bridge: understand your visual content needs, current struggles, and aesthetic instincts

PHASE 2 Building the Swipe File: collect 5-10 reference images and discover patterns in your own taste

PHASE 3 Midjourney sref Practice: style reference workflow with style weight experimentation

PHASE 4 Nano Banana JSON Prompting: structured precision control and model comparison

PHASE 5 Consistent Series + Artifact Assembly: produce 3-5 consistent images and build your template

GATE Mastery Gate: 7 real-world scenarios testing the director's mindset

What you should see: The AI reads your files, asks about your visual content needs, and helps you identify what aesthetics you gravitate toward. You'll build a swipe file of 5-10 reference images and get specific about what you like in each one, discovering patterns in your own taste. Then you use Midjourney's --sref to generate style-matched variations at different weights, switch to Nano Banana's JSON prompting, and compare both models on the same concept. Finally, you pick a real work project and produce 3-5 consistent images using the director's workflow, then assemble your reusable image-prompt-template.md. The mastery gate tests you with seven real scenarios: all application, zero recall.

Time: 30-45 minutes. You'll be generating images throughout. This isn't a reading module.

image-director.xml

Save Your Work

Run the Learning Extraction Prompt to update my-knowledge.md with what you learned.

Save to module-07-images/

image-prompt-template.md: your swipe file, Midjourney sref workflow, Nano Banana JSON template, consistency checklist, and example prompts

Generated images: your 3-5 consistent visual series

session-notes.md from the Learning Extraction Prompt

Next: Module 8: AI for Video & Audio.
You'll take the director's mindset you built here and extend it into motion, sound, and production workflows. Same thinking, different medium.

Run This After Every Module

After completing the module prompt above, paste this into the same conversation. The AI reviews everything that just happened and extracts what you actually learned, not what was presented, but what you demonstrated.

learning-extraction-prompt.xml