When we think about how language works, it's clear that a word's meaning doesn't exist in isolation. Words like "chair" or "dog" don't carry meaning by themselves—they're just letters we've grouped together to represent concepts we recognize.
The meaning of words comes from two key factors: the other words in the sentence and the other words in all previous sentences you've ever encountered. This dual source of meaning is why large language models (LLMs), powered by (1) attention mechanisms and (2) extensive training on millions (even trillions) of sentences, perform so impressively.
Isn’t it ironic that the meaning of a word comes from everything but itself?
This dual process mirrors how humans understand language. We derive meaning from (1) the context provided and (2) our past experiences.
Thus, it is a common misconception that autoregressive LLMs are merely advanced autocompleters. Indeed, these LLMs do much more than predict the next word based on simple statistical likelihood, as with basic autocomplete systems. LLMs don’t just mimic understanding—they achieve a form of it.
Autoregressive models generate text one word at a time, predicting each word based on the preceding context.
Unlike LLMs, we think and plan before we speak—we're not autoregressive. We don't wait to say a word to start thinking about the next one. We build ideas and concepts, and then the words and sentences follow. This leads to one of the most fascinating capabilities of LLMs, which is seen through using Chain-of-Thought (CoT) prompting.
CoT allows these models to mimic this planning behaviour. They can plan their responses, breaking away from the simplistic word-by-word predictions typical of autoregressive models. This ability moves LLMs even further from being mere autocompleters and closer to a form of intelligence—depending on how you define it.
Instead of generating words directly, one at a time, they craft a plan and integrate it into their context, much like how we think before we speak. This is why CoT prompting, especially when combined with examples (few-shot prompting) (as we covered in a recent iteration about the essential prompting approaches), makes models so powerful. It mirrors our thinking process while also mimicking our ability to adapt to different situations.
🙏 We would like to give a big thanks to Geoffrey Hinton for his great keynote talk at the Ai4 conference we attended, which inspired me to write this piece.