Skip to main content

Command Palette

Search for a command to run...

Unsupervised Learning, or Why Patterns Appear on Their Own

Nobody told you where the groups were at the party β€” but you could see them forming anyway. That's unsupervised learning. Clustering, similarity, and embeddings, told as one story.

Updated
β€’10 min read
Unsupervised Learning, or Why Patterns Appear on Their Own

You walk into a party where you don't know anyone. For the first ten minutes, it's just a blur of faces. But by the end of the first hour, without anyone explaining anything to you, your brain has quietly sorted the room. There's the loud cluster near the speakers. There's the clump of people talking shop in the kitchen. There's the quiet corner with the two people who've clearly come just for each other. There's a small knot of early-arrivers who already seem to be in charge.

Nobody handed you a seating chart. Nobody labelled anyone. But the structure emerged anyway β€” and you can feel it.

That is unsupervised learning, more or less. And it powers more of the AI around you than most people realise. In the last post, we met supervised learning, where every example came pre-stapled with the right answer. In this post, we take the answers away. What can a machine still do with a pile of examples if nobody tells it what any of them are?

The answer, it turns out, is "surprisingly much." No math. No code. One party.


The setup: examples with no labels at all

Here's the shape of unsupervised learning in one picture.

Three steps: raw examples go in, the model looks for structure, a map of the data comes out. No grader saying "you got that one wrong." No labels to aim at. The model's entire job is to find regularities that were already there, waiting to be noticed.

This sounds vague, and honestly, it is a little. That's the first thing to know about unsupervised learning: there's no single right answer, because nobody said what right was. Two different unsupervised models looking at the same party might sort the guests into "loud vs quiet" or "knows the host well vs doesn't" or "Slack users vs Discord users," and all three would be defensible. The model finds a structure; the human decides whether it's the useful one.

The honest definition:

Unsupervised learning is what a machine does when it's given a pile of examples with no answers, and asked to say something interesting about how they relate.

Three common flavours of "interesting" keep showing up. Let's walk the party with each one in mind.


Clustering: who's standing with whom

The most intuitive flavour is clustering β€” literally, "find the groups." You look around the party and your brain draws loose circles around people who seem to belong together. The people near the speakers are one group. The kitchen crowd is another. The couple in the corner is their own tiny group of two.

A clustering algorithm does exactly this, but with dots instead of people. Give it a scatter of examples and it tries to draw circles around the clumps β€” without ever being told how many clumps there should be, or what any clump means. The "shape" it's fitting is literally just a set of groupings. A new example shows up, it gets assigned to whichever clump it's closest to, and that's a useful answer surprisingly often.

Here's the thing worth sitting with: the clustering algorithm never learns the name of any group. It doesn't know "these are the kitchen people." It only knows "these twelve dots tend to be near each other." The naming is entirely up to you, after the fact. You open up the clusters, peek inside, and say "ah β€” this one is repeat customers, this one is one-time gift buyers, this one is people who churned last quarter." The algorithm found the shape; you supplied the story.

That sounds like a weakness. It's actually the superpower. Clustering finds structure you didn't know to look for.

Where you've met it today. If a streaming service has ever grouped your viewing history into "moods" ("thriller nights," "comfort shows," "documentary kicks"), that's clustering. If a retailer emailed you because you were flagged as part of a "lapsed high-value customer" segment that nobody explicitly hand-wrote, that was clustering too. Customer segmentation, document topic-grouping, genetic lineage analysis, anomaly detection in network traffic β€” all clustering, all running quietly in the background of systems you use.


Similarity: who's like who

Clustering answers "who are the groups?". A subtler unsupervised move answers a quieter question: which of these things is most like that thing?

Back at the party. You meet someone interesting by the snacks and want to find more people like them. You don't need a full sort of the entire room. You just need, for this one person, a list of "three other guests with the most similar vibe." That's a similarity search, and the machine version is everywhere.

  • The "related products" rail on every shopping site? Similarity search over a pile of products, where the thing you're looking at is the query and the three most similar items come back as the rail.
  • The "people you may know" on a social network? Similarity search over users.
  • "Songs like this one" on music apps? Same move, over songs.
  • Reverse image search, plagiarism detectors, duplicate-file finders, "find more of this face in my photo library" β€” all similarity search, with different kinds of things in the pile.

The useful reframing is this: similarity turns "what is this?" into "what is this near?". You don't need a classifier that can name the thing. You just need a shape of the space in which "nearness" matches your intuition about what counts as similar. Which leads straight into the most important unsupervised trick of all.


Embeddings: turning everything into a point on a map

Here's the move that quietly powers the most impressive AI you use. It's called embedding, and it's the single most important concept in modern machine learning that isn't "neural network." Worth slowing down for.

Back at the party, imagine you could describe every guest with just three numbers β€” say, loud ↔ quiet, local ↔ out-of-town, serious ↔ silly. Each person becomes a point in a tiny three-dimensional space. The DJ near the speakers is up in "loud / local / silly" corner. The shy out-of-towner telling earnest stories in the kitchen is down in "quiet / out-of-town / serious." Your new friend by the snacks is somewhere in between.

Now something remarkable happens. Similarity becomes distance. Two people who feel alike end up as nearby points on your three-axis map. Two people who feel very different end up far apart. You haven't labelled anyone. You haven't told the map what "loud" means. You've just picked a few axes and placed everyone in the resulting space.

That is exactly what an embedding does. An embedding is a way of taking messy real things β€” words, images, songs, users, products, sentences, faces β€” and turning each one into a point in a space so that similar things end up close together. The space usually has a lot more than three dimensions (hundreds, sometimes thousands), which makes it impossible to draw β€” but the idea is identical.

Once you have an embedding, almost every other unsupervised move becomes easy. Clustering? Just draw circles around clumps of points. Similarity search? Just find the nearest points. Recommendations? Just look at where a user's embedding sits and suggest things whose embeddings are nearby. Duplicate detection? Points that are almost on top of each other are duplicates.

The magical-feeling part is that a well-trained embedding captures meaning you never explicitly taught it. Famously, once word embeddings got good, researchers noticed you could do arithmetic on them β€” "king" minus "man" plus "woman" ended up very close to "queen." Nobody programmed that relationship. It fell out of the shape the embedding model had learned from reading a lot of text. The axes of the space weren't "royalty" and "gender" β€” there are hundreds of unnamed axes β€” but some combination of them behaved like those concepts because it had to, in order to fit the data.

Hold onto this: an embedding is unsupervised learning's way of turning the whole mess of the real world into a single map you can measure distances on. Almost every interesting thing modern AI does β€” search, recommendation, clustering, retrieval, even the way LLMs internally represent words β€” is built on top of embeddings. When we hit Module 3, we'll see how neural networks actually produce them. For now, the mental picture of "messy thing β†’ point on a map" is enough.


Why unsupervised matters more every year

Supervised learning is powerful, but it has an expensive bottleneck: someone has to produce the labels. Humans clicking "spam," doctors annotating scans, reviewers tagging content. Labels are slow, costly, and often inconsistent. For every problem where you have labels, there are a hundred where you have data but no answers.

Unsupervised learning gets around that. Give it a pile of unlabelled text, images, transactions, anything β€” and it will still extract structure. Clusters, similarities, embeddings. This is why it's the workhorse of any setting where data is cheap and labels are not:

  • Pretraining large models. Before an LLM ever sees a labelled example, it's trained on a truly enormous amount of raw text with no labels. That stage is essentially unsupervised, and it's where most of what the model "knows" comes from.
  • Search and retrieval. Nothing in a giant document collection is labelled "relevant to this query." Embeddings make relevance fall out for free.
  • Anomaly detection. You rarely have labelled fraud; you have a pile of transactions, most of which are normal. Unsupervised methods learn the shape of "normal" and flag things that don't fit.
  • Exploratory analysis. Any time a data scientist runs a clustering on a new dataset just to see what's in there β€” that's unsupervised learning doing the thing it's best at.

The trend of the last few years is pretty clear: the biggest jumps in AI capability have come from unsupervised (or barely-supervised) training on gigantic piles of raw data. That's the story of GPT. That's the story of the big image models. It turns out a model that has spent years of compute just looking at the world without labels builds a shape rich enough that, when you eventually show it a few labelled examples, it can learn almost anything in a handful of tries.


What just changed in your head

You started this post thinking of "unsupervised" as the weaker cousin of the real thing β€” useful, maybe, but vague. You're ending it with a different picture: a machine walking into a party it's never seen, quietly noticing the groups, the neighbourhoods, the shape of who stands near whom, and producing a map you can then use for a dozen things. No answer key required.

The payoff, if I can name it in one line, is this: unsupervised learning lets you extract structure from the world at a scale no team of human labellers could ever match. That's the whole reason it's suddenly at the heart of modern AI. The limit on supervised learning has always been the patience of the people writing down answers. Unsupervised learning has no such limit. It eats the whole internet and finds the shape that was already there.

In the next post, we meet the third and weirdest flavour: reinforcement learning. No labels, no pretty structure β€” just an agent taking actions in an environment and learning, slowly and expensively, from rewards and punishments. It's the flavour behind game-playing AIs, robot locomotion, and β€” increasingly β€” the final polish on every chatbot you talk to.


Course navigation

⬅️ PreviousπŸ“ You are hereNext ➑️
⬅️ Previous
M2.2 Β· Supervised Learning
M2.3Next ➑️
M2.4 Β· Reinforcement Learning

πŸ“š AI Zero to Hero Β· Course Home β€” all 33 posts, six modules.


Cover photo via Unsplash. This post is part of the AI Zero to Hero series.

More from this blog

Learn AI - Zero to Hero

111 posts