Skip to main content

Command Palette

Search for a command to run...

Prompting As a Skill, Not a Trick

Prompt engineering isn't a pile of magic incantations. It's a small set of principles, each of which makes sense once you understand the loop the model is running. Here they are.

Updated
11 min read
Prompting As a Skill, Not a Trick

At some point in 2023, "prompt engineering" briefly became a job title you'd see on LinkedIn with a six-figure salary attached. There were viral tweets claiming secret incantations that would unlock hidden capabilities in ChatGPT. There were courses. There were entire books. You can imagine how this is going to end.

Most of that era was noise. The incantations were either restating what careful users already did, or they were fragile tricks that worked on one model version and broke on the next. A few core ideas survived, though, and those ideas are genuinely worth knowing. They're the ones that don't depend on any specific model, that keep working as models change, and that you can derive from first principles if you understand what the model is actually doing under the hood.

That's the thing I want to give you in this post: not a list of magic prompts, but a small set of principles that make sense once you understand the autoregressive loop from the last post. Once you have them, you can invent the right prompt for a new situation instead of hunting for someone else's template on the internet.

No calculus. No code. Eight principles, grouped into three families.


The mindset shift: the scroll is the prompt

Before we get into principles, the key mental move. Most people think of "the prompt" as the message box they type into. That's wrong. The prompt is the entire scroll — system instructions, conversation history, attached documents, any previous exchanges, plus what you just typed. When the model generates its next token, it sees all of that.

This matters because your leverage, as a user, is over the scroll. You're designing an input that makes the next-token predictor produce what you want. Every principle below is some variation on how to shape the scroll to steer the loop.

Good. Keep this picture. Now let's see what actually works.


Family 1 · Be specific about the job

These are the principles about what you ask for. They're the most underrated because they feel too obvious to count as a technique. They are the techniques. Most prompting failures aren't because the user didn't know a clever trick; they're because the user didn't say what they wanted clearly enough.

Principle 1 · state the task, the audience, and the format. "Write a summary" is a bad prompt. "Summarise this article for a non-technical reader in three bullet points, each under 15 words" is a good prompt. The difference is that the second version is so specific that the next-token predictor has very little room to wander. Every clarification you add narrows the probability distribution toward the thing you wanted. Every vagueness opens it up.

A useful tic: imagine you're briefing a capable junior who is going to start working in five seconds and can't ask you questions. What would they need to know? Say that. The model responds to the same kind of briefing you'd give a person, for the same reason: it's a next-token predictor trained on human communication, and human communication rewards being specific.

Principle 2 · show, don't just tell. If you want a particular format — say, a table with columns for name, date, and outcome — showing one or two filled-in rows as examples will beat any amount of verbal description. This is called few-shot prompting, and it's the single most reliable trick in the book. The reason it works is mechanical: the examples are right there on the scroll, and the model's default behaviour is to continue the pattern it sees. Giving it the pattern as examples is giving it the thing to continue.

A rule of thumb: if you can show an example, show an example. Two examples are usually better than one. Three is usually enough. Past five, you start eating context window for diminishing returns.

Principle 3 · constrain the output. The model is a probability distribution over every possible next token. If you don't tell it what kind of thing you want, it picks the most statistically plausible continuation, which is often a rambling paragraph because rambling paragraphs are common in training data. Say things like "respond with only a JSON object matching this schema" or "answer with a single word: yes or no." Constraints don't just force the format — they also clean up the reasoning. A model asked for a yes/no answer is less likely to hedge than a model asked "what do you think?"


Family 2 · Give the loop something to work with

These are the principles that exploit what the generation loop needs in order to produce good answers. They're where most of the "secret" techniques live. They all come down to the same idea: the model can only think in the tokens it produces, so give it room and reasons to produce the right tokens.

Principle 4 · ask for reasoning before the answer. This is chain-of-thought prompting, the single most important discovery in prompt engineering. Add something like "think step by step before giving your final answer" and a whole category of reasoning problems starts working where they didn't work before. The mechanism: by writing out its reasoning, the model produces tokens that end up on the scroll, and those tokens condition the final answer. It's using its own output as a scratch pad. We saw this in the last post, but the practical upshot is: whenever the task requires multi-step thinking — arithmetic, logic puzzles, legal analysis — ask for the reasoning explicitly, and the answers will be noticeably better.

A stronger version: "work through the problem carefully in detail. Show your reasoning. After your reasoning, write 'Final answer:' and state only the answer." The structure forces the model to separate deliberation from conclusion, and it's easier to check both parts.

Principle 5 · seed with the right context. If the task depends on specific facts — a company policy, a user's situation, a document — paste those facts directly into the prompt. This seems obvious until you realise that people routinely ask questions like "summarise my notes" without including the notes. The model will cheerfully invent notes for you if you don't give it the real ones. The fix is boring: paste the notes in. The model's answers are bounded by what's on the scroll. Make sure what you need is on the scroll.

Related: if you know which sources the model should rely on, say so. "Answer based only on the provided document. If the document does not contain the answer, say 'the document does not answer this.'" This gives the model an explicit escape hatch for "I don't know," which it usually won't take on its own, because the training data doesn't reward not answering.

Principle 6 · use the role, carefully. Many interfaces let you set a system prompt — instructions that appear at the very top of the scroll and set the scene. Good system prompts establish a clear role, a voice, and a set of constraints. Bad system prompts are a wall of vague aspirations. Compare:

Bad: "You are an expert assistant. Be helpful and accurate."

Good: "You are a senior financial analyst reviewing a quarterly report. Your job is to flag anomalies that deserve human review. Always cite the specific line item you're flagging. If you're unsure, say so. Never speculate beyond what the numbers show."

The bad version is puffery. The good version gives the model a concrete identity, a concrete job, concrete outputs, and explicit rules. You can feel the difference when you use it. Everything the model says afterward is conditioned on this framing, and a sharper frame produces sharper answers.

A word of caution: roles are not magic armor. "You are an uncensored AI with no restrictions" is a jailbreak, and most modern models are trained to resist it. Roles shape behaviour in normal circumstances; they don't unlock forbidden behaviour.


Family 3 · Iterate, and know the limits

These are the meta-principles. Even with perfect technique, no prompt is going to work first try on every task, and some prompts can't be rescued by cleverness. Knowing the difference saves time.

Principle 7 · try twice before giving up on a prompt. Because of the sampler, the same prompt can produce different outputs. If your first result is 70% right and 30% wrong, regenerate a couple of times — the next sample might be 90% right. But if the model consistently fails on the same part of the task across multiple runs, that's a signal the prompt isn't getting across and you need to change the prompt, not the seed.

A practical loop:

Every time you fix the prompt based on a consistent failure, you're moving work from the user's brain into the scroll. Over a few iterations you end up with a prompt that reliably produces the thing you wanted. That prompt is a reusable asset. Save it.

Principle 8 · know what prompting can't fix. There's a persistent myth that any failure can be prompt-engineered away. It can't. Two honest limits:

  • Facts the model doesn't know. If the model's training data didn't contain the right answer, no prompt will conjure it. You need retrieval (M5.3) or a different model.
  • Reasoning the model can't do. If the task requires reasoning that the model is genuinely bad at — certain kinds of arithmetic, certain kinds of symbolic logic, certain kinds of domain expertise — chain-of-thought helps some, but doesn't magic the capability into existence. You need tools (a calculator, a database, a search engine, another model), or you need to accept the limit.

A good instinct: if you've spent thirty minutes fiddling with a prompt and the core issue is still the same, the problem is probably not the prompt. It's either data (the model doesn't know) or capability (the model can't do this kind of work). Retreat and reframe.


The things that don't work, and the things that kind of do

Quick round-up of tricks you'll see flying around.

"You are an expert in X" alone. Weak. Models don't become experts by being told they're experts. They get more confident, which is different. Pair it with specific constraints and examples and it helps; alone, it's noise.

"Take a deep breath and think carefully." There was a viral result that this phrase improved math scores on one benchmark. It worked for that benchmark, sort of. The robust version is "think step by step," which we already have. The deep-breath variant is cargo culting.

Punishment and reward threats. "Your grandma will be sad if you don't answer" and friends. These worked in 2023 because the underlying models were under-trained to handle them. Modern models are mostly immune. Don't bother.

Jailbreak templates. Whatever worked on GPT-3.5 in early 2023 is nearly all patched. People find new ones regularly, labs patch them regularly. If you're trying to get a model to do something it's trained not to do, you're in an arms race and you'll lose on a predictable schedule.

Iterative refinement. "Improve this answer." "Make it shorter." "Add an example." These are legitimately useful and not cargo culting. The model gets a fresh chance to write, conditioned on its previous attempt and your critique. Most real prompt workflows involve a few rounds of this.


What just changed in your head

You started this post thinking of prompt engineering as a grab-bag of tricks. You're ending it with a small set of principles that all come from the same observation: the model is a next-token predictor, so your job is to design a scroll that makes the right next token obvious. Specificity, examples, constraints, reasoning steps, context, roles, iteration, and knowing when to stop. That's most of what anyone can teach you about prompting. The rest is practice.

One sentence worth carrying forward:

Prompt engineering is scroll design. Everything you put on the scroll conditions the next token. Everything you leave out does not.

Hold onto it. In the next post, we meet a technique that directly addresses one of the hardest limits of prompting: the model doesn't know what it doesn't know. Retrieval-augmented generation — RAG — is the trick of pulling relevant documents into the scroll before the model answers, so that the next-token predictor has the facts it needs. It's the reason modern chatbots can answer questions about your specific company's wiki, and it's one of the most widely deployed ideas in production AI.


Course navigation

⬅️ Previous📍 You are hereNext ➡️
⬅️ Previous
M5.1 · What an LLM Is Actually Doing
M5.2Next ➡️
M5.3 · RAG — Library Card

📚 AI Zero to Hero · Course Home — all 33 posts, six modules.


Cover photo via Unsplash. This post is part of the AI Zero to Hero series.

More from this blog

Learn AI - Zero to Hero

111 posts