What makes a model "large"

LLMs explained: what the "large" refers to, what parameters are, and why bigger isn't always better.

"LLM" is the acronym you'll hear more than any other. Large Language Model. We've now built up every piece needed to actually understand it, so let's pin down what each word means and kill one persistent myth.

Breaking down the three letters

Language is the easy one: these models are trained on an enormous amount of text, and language is what they work with. (Newer models add images and audio, but text is the heart of it.)

Model we covered: a box that takes input and produces output. For an LLM, tokens go in and tokens come out. Underneath, it's a transformer predicting the most likely next token, over and over.

Large refers to the parameters, the tuned dials inside the model (the "weights" we kept mentioning in how a model is built). Modern models have billions or even trillions of them. When you hear "this model has 70 billion parameters," that's 70 billion little dials, tuned during training, that together encode everything the model learned. More parameters means more capacity to capture patterns.

So an LLM is, quite literally, a very large language-prediction model. That's the whole name.

A note on those numbers

You'll see confident parameter counts thrown around, but be a little skeptical. Only the older models had their sizes officially disclosed, GPT-2 at 1.5 billion parameters, GPT-3 at 175 billion. For most current frontier models (GPT-4 and later, and every Claude model) the labs stopped publishing the numbers. So if someone quotes you the exact size of a 2026 model, they're almost certainly guessing. That silence is itself a useful signal: size stopped being the headline because, as we're about to see, it isn't the whole story.

The myth: bigger is always better

It's tempting to assume the model with the most parameters wins. It often doesn't, and believing the myth will steer you wrong in real conversations.

Bigger models are more capable in the abstract, but they're also slower, more expensive to run, and need more hardware. For tons of real jobs, a smaller, faster, cheaper model is the better choice, sometimes a smaller model fine-tuned for one specific task beats a giant general one outright. Speed, cost, and privacy (a small model can run somewhere you control) are all real trade-offs, not afterthoughts.

This matters directly in your world: the question is rarely "what's the biggest model," it's "what's the right model for this job and this budget." We dig into exactly that choice in Working with AI.

Why two models behave differently

If size isn't the whole story, what does make one model better than another? Four things, and you've already met all of them in how a model is built:

The data it was trained on, the mix and quality of what it read.
The fine-tuning, the examples used to teach it how to behave.
The reinforcement and values, how it was coached on what counts as a good answer.
The system prompt and tools wrapped around it when you use it.

That's why a 1.3-billion-parameter model, coached well, once beat a 175-billion one people preferred. Parameters are just the raw size of the motor; these four levers are what tune it for the job. Hold this list, it's the foundation for actually choosing a model later.

Quick translation guide: parameters ≈ weights ≈ the model's "size." All three words point at the same thing, the tuned dials from training. And remember: size is only one of four things that make a model what it is.

📝 Practice

Pick two AI models you've used or heard of. Without reaching for parameter counts (you probably can't find the real ones anyway), try to describe how they differ in behavior: is one more careful, more creative, faster, better at code? Those differences come from the four levers above, not just size. Naming them is the skill.

Breaking down the three letters

The myth: bigger is always better

Why two models behave differently

On this page