Pretraining, Finetuning, RLHF — The Three-Act Training Story
How a raw language model becomes ChatGPT. Three acts, three very different kinds of teacher, and one polite assistant who used to know how to swear.
Apr 12, 202611 min read1
Search for a command to run...
Articles tagged with #rlhf
How a raw language model becomes ChatGPT. Three acts, three very different kinds of teacher, and one polite assistant who used to know how to swear.
Nobody tells the puppy the rules. It just gets a treat when it does the right thing. That's the whole idea behind every game-playing AI — and the final training step of every chatbot you talk to.