A Short, Honest History of AI

In 1956, a group of researchers gathered for a summer workshop at Dartmouth College. They had a goal and a deadline: figure out how to make machines think, by the end of August.

They didn't quite hit it.

What they did do was coin the phrase "artificial intelligence" and launch a seventy-year rollercoaster of wild optimism, brutal disappointment, and — eventually — the tools we now use every day. The story is weirder and more human than the highlight reel suggests. There were decades when AI research was a career-killing backwater. There were labs that had to rename themselves to get funding. And then, very suddenly, in one very specific year, everything flipped.

This post is that story, in plain English. You'll come out the other side with a sense of why modern AI looks the way it does, which is hard to get from headlines alone.

The timeline, in one picture

Six moments. Two catastrophic collapses. One absurdly specific paper in 2017 that almost nobody outside the field read at the time. Let's walk through them.

1950: Turing asks the question that won't go away

Before there was "AI," there was Alan Turing, a British mathematician who had just helped crack German naval codes during World War II. In 1950, he published a paper called Computing Machinery and Intelligence that opened with a provocation: "Can machines think?"

Turing didn't try to define "think," because he knew that argument could go on forever. Instead, he proposed a game. Put a human and a computer in two rooms. Have a judge type questions to both. If the judge can't reliably tell which is which, then — for all practical purposes — the machine is doing something we'd recognise as intelligent.

This was the Turing Test, and it's been poked at ever since. But the point isn't whether the test is perfect. The point is that Turing moved the question from philosophy ("what is thought, really?") to engineering ("can we build something that behaves like a thinker?"). That shift — from "is it really intelligent" to "does it do the job" — is the attitude the whole field has run on ever since.

1956: Dartmouth and the word "AI"

Six years later, a computer scientist named John McCarthy wanted funding for a summer research workshop. He needed a name for the field that would convince the Rockefeller Foundation to write a check. He picked "artificial intelligence."

A few things to notice about that name. It's ambitious on purpose — it promises intelligence, not "pattern matching" or "statistical inference." It's also deliberately vague. McCarthy later said he chose it partly to avoid fights with other researchers who already had their own labels for their work. "AI" was a big enough tent to put everyone under.

The Dartmouth workshop's proposal famously claimed that "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it." They thought this would take about two months of work for ten people.

Yeah.

The workshop didn't produce a thinking machine. What it did produce was a label, a community, and a wildly overconfident set of promises that would, about twenty years later, crash into reality.

1974: The first winter

Through the 1960s, AI researchers made real progress on narrow problems. A program called SHRDLU could have sensible-looking conversations about a virtual world made of coloured blocks. Another called ELIZA could do a passable impression of a therapist. There was genuine excitement that general intelligence was maybe a few years away.

Then governments started asking for actual results. The British government commissioned a review (the Lighthill Report) and concluded, in 1973, that AI had drastically overpromised. The American military-funded research agency — DARPA — came to similar conclusions. Funding was cut. Labs closed. Graduate students changed fields. This was the first AI winter, and it lasted most of a decade.

The painful lesson: the problems AI had solved were the ones that looked hard but were actually narrow and formal — chess endgames, block puzzles, simple conversations. The problems that looked easy — understanding a photograph, holding a messy conversation, moving through a cluttered room — turned out to be desperately hard. The early researchers had mistaken the visible tip of intelligence for the whole iceberg.

The 1980s: Expert systems boom (and bust)

AI had a brief comeback in the 1980s with something called expert systems. The idea was simple: take a human expert — a doctor, a geologist, a financial analyst — and spend months interviewing them about how they think. Write down all their rules as a giant "if this, then that" program. Run the program. You'd have, in theory, a bottled expert.

It sort of worked. Companies spent a lot of money on expert systems. Some of them did useful things. But the approach had a fatal problem: the world doesn't fit into neat rules. Real experts use judgment, context, and gut feeling — things that resist being written down as if-then statements. Every expert system became brittle the moment the world stepped outside the rules its authors had anticipated.

By 1987, the expert-systems market collapsed. This was the second AI winter, and it was worse than the first because this time the industry had been selling products. Companies went under. Investors got burned. For years afterwards, calling yourself an "AI researcher" at a cocktail party was a good way to make people politely change the subject.

1997: Deep Blue and the narrow-AI triumph

In May 1997, an IBM computer named Deep Blue beat the reigning world chess champion, Garry Kasparov, in a six-game match. It was front-page news everywhere.

Deep Blue is an interesting inflection point because, strictly speaking, it wasn't doing anything new. It was a hand-coded chess engine with a lot of expensive hardware — basically the endpoint of the old symbolic-AI tradition. It won by looking ahead further than any human could, not by "thinking" the way Kasparov did.

What Deep Blue showed was that raw compute plus clever engineering could crack problems that had seemed to require genuine human insight. It didn't show a path to general intelligence. But it kept the dream alive during the winter — proof that something in this field could still win.

2012: AlexNet and the earthquake

This is the moment that matters most. If you only remember one year from this post, remember 2012.

Every year, researchers competed in an image-recognition contest called ImageNet, which had about a million photos labelled with what they contained (dogs, cats, trucks, and so on). For years, the winners had been using hand-engineered classical computer-vision methods, and each year's improvement was a few fractions of a percent. It was a slow, incremental field.

In 2012, a team from the University of Toronto — Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton — submitted an entry called AlexNet. It was a neural network, a type of model that had been deeply unfashionable for twenty years because it never seemed to work on real problems. They trained it on two gaming GPUs (graphics cards meant for video games) sitting under Alex's desk.

AlexNet didn't just win the contest. It won by a margin so large it was embarrassing. The second-place entry was using the old methods; AlexNet had roughly half its error rate. The whole field had been incrementally polishing a technique, and a completely different approach walked in and beat it by a decade's worth of improvement overnight.

The AlexNet paper triggered the deep learning revolution. Within three years, essentially every serious computer-vision researcher had switched to neural networks. Within five, the same kinds of models had cracked speech recognition, machine translation, and a dozen other problems that had seemed stuck.

Why did it work in 2012 and not 1992? Three things had quietly changed:

Data. ImageNet and the broader internet gave researchers access to millions of labelled examples, not thousands.
Compute. GPUs, built for video games, turned out to be perfect for the math neural networks need. Suddenly a grad student could run experiments that would have required a supercomputer a decade earlier.
A few refined techniques. Small but important tweaks to how networks were built and trained.

None of those three on their own would have been enough. Together, they unlocked a door that had been stuck for forty years. We'll spend all of Module 1.4 on this moment because it's the hinge the whole modern story swings on.

2017: The transformer shows up

Five years after AlexNet, a group of Google researchers published a paper with one of the cheekiest titles in computer science history: "Attention Is All You Need." It introduced a new kind of neural network architecture called the transformer.

The transformer was designed for language. Older language models read a sentence one word at a time, left to right, which made it hard to capture long-range connections. Transformers could look at the whole sentence at once and decide which words mattered most to which — a trick called attention. (We'll spend all of Module 4 on this. For now, just note that "attention" is the piece that makes modern LLMs work.)

In 2017, the transformer paper was an interesting result for specialists. Almost nobody outside the field noticed. But over the next five years, it would turn out to be the most important architectural idea of the century, because it scaled. You could throw bigger and bigger models at more and more data and transformers kept getting better. The old architectures would have plateaued. Transformers didn't.

2022: ChatGPT, and the public notices

OpenAI had been releasing transformer-based language models for a while — GPT-1 in 2018, GPT-2 in 2019, GPT-3 in 2020. Each was better than the last. Each got a little attention from the tech press and a shrug from everyone else.

Then, in November 2022, they put a chat interface on top of a fine-tuned GPT-3.5 and called it ChatGPT. It wasn't a technical leap — the underlying model was mostly stuff they'd already built. What was new was the packaging: a free, fast, friendly chat window that anyone could use.

Within two months, ChatGPT had a hundred million users. It's the fastest-adopted consumer product in history. Students were writing essays with it. Programmers were pasting their code into it. Executives who had ignored every previous AI announcement were suddenly calling emergency meetings.

The public had finally noticed — not because the technology was brand new, but because, for the first time, you could just talk to it.

The throughline

Zoom out from the dates for a second. There's a pattern in this story, and it's worth holding onto:

Symbolic AI (the Dartmouth era through the 1980s) tried to build intelligence by writing down rules. It worked on narrow, formal problems (chess, block puzzles) and collapsed on messy, real-world ones.
Neural networks (the 2012-onwards era) skipped the rules and learned from examples. They worked on messy problems first — images, speech, language — which had been the hardest problems for the old approach.

Seventy years of AI history is basically the story of one approach hitting a wall, sleeping through a winter, and another approach eventually walking through the wall. And the second approach didn't win because it was philosophically superior. It won because data and compute finally caught up to it.

That's the quiet truth the field has been slowly absorbing: sometimes the right idea has been around for decades, waiting for the world to give it enough to eat.

What just clicked

If you started this post thinking AI had been on a steady march forward, that picture should feel wrong now. The real story has two crashes, two long quiet stretches, and then a genuinely sudden acceleration. The technology you use today is barely a teenager.

The other thing worth sitting with: why deep learning won isn't a mystery. It won because it got enough data and enough compute. That's going to matter enormously in the next post — because it sets up a philosophical question the field has been fighting about since Dartmouth, and nobody expected the answer to be "bigger."

Coming up in M1.3 — The three tribes of AI, we'll look at the three philosophical camps that have been arguing for seventy years about how to actually build a mind, and why one of them quietly won while the others are still standing around insisting they have a point.

⬅️ Previous: M1.1 — What AI actually is (and isn't)

⬅️ Previous	📍 You are here	Next ➡️
⬅️ Previous M1.1 · What AI Actually Is	M1.2	Next ➡️ M1.3 · The Three Tribes of AI

📚 AI Zero to Hero · Course Home — all 33 posts, six modules.

Cover photo via Unsplash. This post is part of the AI Zero to Hero series.

A Short, Honest History of AI — From Turing to ChatGPT

The timeline, in one picture

1950: Turing asks the question that won't go away

1956: Dartmouth and the word "AI"

1974: The first winter

The 1980s: Expert systems boom (and bust)

1997: Deep Blue and the narrow-AI triumph

2012: AlexNet and the earthquake

2017: The transformer shows up

2022: ChatGPT, and the public notices

The throughline

What just clicked

Course navigation

Comments (1)

AI Zero to Hero

The Three Tribes of AI — Symbolic, Connectionist, Statistical

More from this blog

A Reading List and Two Habits: Staying Current in Ten Minutes a Week

What to Decide Now, What to Defer, What to Ignore: The AI Action Matrix

The Next 18 Months of AI: A Calibrated Leader's Forecast

Calibrating Your AI Exposure: Upside and Downside in One Matrix

Five AI Capabilities That Matter for Your Business, and Five That Do Not

Command Palette

The timeline, in one picture

1950: Turing asks the question that won't go away

1956: Dartmouth and the word "AI"

1974: The first winter

The 1980s: Expert systems boom (and bust)

1997: Deep Blue and the narrow-AI triumph

2012: AlexNet and the earthquake

2017: The transformer shows up

2022: ChatGPT, and the public notices

The throughline

What just clicked

Course navigation

Comments (1)

AI Zero to Hero

The Three Tribes of AI — Symbolic, Connectionist, Statistical

More from this blog