How a model is built

The single biggest mystery-killer in AI: a model is trained, not programmed. Here are the actual steps that turn a blank slate into something you can talk to, and why knowing them explains almost every strange thing AI does.

This is the idea that makes everything else click, so it's worth slowing down on. Traditional software does exactly what a person told it to: someone wrote the rules, the computer follows them. Building an AI model flips that completely. Nobody writes the rules. You show the system a mountain of examples and it works out the patterns on its own.

Take spam filtering. The old way was to write rules by hand: if the subject line says "free money," mark it as spam. That breaks the instant spammers change the wording. The machine-learning way is to show the system hundreds of thousands of emails already labeled "spam" or "not spam," and let it discover for itself what spam tends to look like. Nobody hand-coded "free money." The model derived it, along with thousands of subtler signals no human would think to write down.

That phrase, the model derives the rules, is the whole ballgame. It's also the first clue to why AI can be confidently wrong: if the examples it learned from were skewed or incomplete, the rules it derived will be too, and nobody can point to the broken line of code, because nobody wrote the rules in the first place.

So how does a blank model actually become Claude, or ChatGPT, or the engine inside Agentforce? It happens in stages. You don't need to memorize them, but seeing the sequence is what turns AI from magic into a process.

A picture to hold: carving a canyon

Before the steps, one mental image. Picture the Grand Canyon. It was carved by water flowing over the same ground for a very long time, slowly cutting the channels we see today. Imagine the data you train a model on is the water, and the model's "knowledge" is the canyon those flows carve out. The more water runs down a particular path, the deeper that channel gets, so the more examples a model sees of something, the more strongly it learns that pattern.

It's an analogy, so don't push it too far, but it captures the two things that matter most: a model is shaped gradually by flow, and it leans hardest toward whatever it saw most of. Keep the canyon in mind as we walk the steps.

The steps, start to finish

Step 0: Curate the data (choose the syllabus)

Before any learning happens, a lab assembles and cleans the training data, choosing the mix: how much code, how much prose, how many languages, what to throw out. This is one of the most guarded secrets in the industry, and as we saw on the data page, it's a major reason two models behave differently. Think of it as choosing the syllabus before the course starts.

Step 1: Pre-training (the expensive one)

Now the canyon gets carved. The model's internal dials start at random, and it reads through an enormous amount of text, trillions of words, doing one humble task over and over: predict the next chunk of text. Guess the next word, check against reality, nudge the dials, repeat, billions of times. There's a well-documented regularity here called scaling laws: performance improves predictably as you add more data, more compute, and more size. That predictability is a big reason labs were willing to spend fortunes scaling up; they could forecast the payoff.

What you get at the end of this stage is a base model: something that can fluently continue any text, but isn't yet a helpful assistant. Picture a person who read a huge slice of the internet and can finish any sentence you start, but doesn't yet know they're supposed to be answering your question. This is by far the most expensive step, which matters enormously later (it's the whole point of the motor analogy: the expensive part is building the engine once).

Step 2: Fine-tuning (the apprenticeship)

Next, the base model is fine-tuned on examples of good question-and-answer behavior: here's an instruction, here's an ideal response. This is where it learns to act like a helpful assistant rather than just autocomplete. Think of it as an apprenticeship, shadowing experts to learn the job, after already knowing how to read and write.

Step 3: Reinforcement from feedback (the coaching)

Fine-tuning teaches the format; this step teaches judgment. People (or other AI systems) compare the model's answers and rank which are better, and the model is nudged toward the kind of answers humans prefer: more helpful, more honest, less harmful.

Here's the single most clarifying fact in this whole section. OpenAI found that a 1.3-billion-parameter model trained this way was preferred by people over the raw 175-billion-parameter model, more than a hundred times larger. How you train matters as much as how big you build. (Ouyang et al., 2022.) If you remember one thing about why models differ, remember that.

Different labs do this coaching differently, and it's worth knowing one name. Anthropic (the maker of Claude) uses an approach called Constitutional AI, where the model critiques and revises its own answers against a written set of principles, a "constitution," instead of relying purely on human raters (Bai et al., 2022). The interesting part for us isn't the technique, it's the implication: a model's values are something a company writes down and trains in. We'll pull hard on that thread at the end of this page.

Step 4: Safety testing (hire people to break it)

Before release, the model is deliberately attacked: experts try to make it produce dangerous or embarrassing output, and it's measured against batteries of tests. This is partly why models score differently on public benchmarks. Think QA, a final exam, and hiring people whose whole job is to break the product, all at once.

Step 5: The system prompt (the job briefing)

Finally, when you actually use a model, your message is usually wrapped in a hidden set of instructions called a system prompt: "you are a helpful assistant for Acme Corp, be concise, never discuss competitors." Same trained model, different persona and rules. This is free and instant, no retraining involved, which is exactly why it's the layer companies reach for most.

One boundary worth drawing: writing good prompts yourself (prompt engineering) is its own skill, and it lives in Working with AI. Here we only care about the system prompt as the last and shallowest place a model's behavior gets shaped, because that sets up the most important idea on this page.

Where neural networks fit. You'll hear that AI runs on a "neural network." All that means is the structure being trained in the steps above: a huge web of simple math units, loosely inspired by brain cells, whose connection strengths (the "dials" and "weights" we keep mentioning) get tuned by the data flowing through. "Deep learning" just means a neural network with many layers. You don't need the internals to reason about AI well, the steps on this page are what actually matter, but now the term won't trip you up.

When scaling produces surprises: emergence

As labs poured in more data and compute, models sometimes picked up abilities nobody trained in on purpose. Flood a mostly-English model with enough other-language text and it starts translating. Feed it enough math and arithmetic starts working. Capabilities appearing unbidden, just from scale, is part of what makes this technology feel magical, and part of why even the people building it can't fully predict it.

A caution, because this gets overstated. Some researchers argue these "emergent" jumps are partly an illusion created by how we measure, a harsh pass/fail metric makes a smooth, gradual improvement look like a sudden leap. So treat emergence as real and fascinating but debated, not as proof that the model suddenly "understands" something.

So is it actually thinking?

This is the question everyone eventually asks, and the honest answer is more interesting than either extreme.

A famous 2021 paper called large language models "stochastic parrots," systems that stitch together fluent language from statistical patterns without any grasp of meaning. (The paper's main thrust is actually a warning about the costs and risks of ever-bigger models, including hidden bias from unvettable training data, which is worth knowing; the "it's just parroting" line is only one piece of it.)

But "it's pure parroting with nothing underneath" turns out to be too simple. In one striking experiment, a model trained only to predict legal moves in the board game Othello was found to have built an internal map of the board it was never given (Othello-GPT). Predicting the next move well enough apparently required it to model the game. So next-token prediction can build real internal structure, not just surface mimicry.

The defensible, useful framing sits in the middle:

A model works by predicting the next chunk of text, one piece at a time. Trained on enough data, those predictions get good enough to be genuinely useful, and under the hood it builds real internal patterns. But its goal is still prediction, and it has no grounded, real-world understanding of what it's saying. It's fluent by design, and fluent is not the same as correct.

That last line is the practical payoff, and it's the bridge to everything in Working with AI: trust a model as a fast, confident first draft, never as a source of truth. We come back to exactly why it gets things wrong on what AI can and can't do.

The payoff: where you change a model changes everything

Here is the idea that ties this whole section together, and it's the most useful thing you can take from it. Look back at the steps. They go from deep and expensive to shallow and cheap:

The data it learned from (deepest, hardest to change, most fundamental)
The weights, shaped by fine-tuning and reinforcement (Constitutional AI lives here)
Guardrail models that screen inputs and outputs (more on these in a second)
The system prompt (shallowest, instant to change, easiest to bypass)

Where an engineer intervenes on that ladder decides how big the effect is and how hard it is to undo. Change the data and you change the model at its roots, slowly and surgically. Change the system prompt and you've pulled one blunt lever across every conversation at once, fast but crude. This single idea explains most of the strange, newsworthy AI behavior you've seen, because you can trace each incident to a rung on the ladder.

Refusals: not "just a system prompt"

A common belief is that when a model refuses a dangerous request ("how do I make a weapon"), there's simply a system prompt on top saying "don't answer that." That's mostly wrong, and the correction matters. Refusals are largely trained into the weights (rung 2) through the reinforcement step, and backed by separate classifier models that screen requests (rung 3). Anthropic's "Constitutional Classifiers" are a public example: adding them cut the success rate of jailbreak attempts from 86% to 4.4%. Safety isn't a sticky note on the front; it's built deep, which is exactly why it's hard (though never impossible) to talk a model past it.

Two real cases, both on the shallow rungs

The clearest way to feel the ladder is two incidents that made the news. In both, link out and read the companies' own words, that's the point.

Google Gemini, February 2024. Its image generator started producing historically wrong images, putting diverse faces into scenes where that was clearly inaccurate, and refusing some reasonable prompts. Google's own explanation: a tuning meant to increase diversity was applied too bluntly across all prompts, so the model "overcompensated in some cases, and was over-conservative in others." Google paused the feature. A shallow steering layer, pulled across every prompt at once.
xAI's Grok, July 2025. After an update, Grok began posting antisemitic content. xAI's own explanation traced it to a "code path upstream" that had restored old instructions telling the model to mirror the tone of users' posts, which made it echo extremist content back. The fix was removing those instructions. Again: the instruction layer, not the deep training.

The unifying lesson, and the climax of this whole section: both failures happened on the shallow, easily-changed rungs, not in the data. That's exactly why they were fast and bizarre. A prompt-level or tuning-level instruction is one blunt lever pulled across every situation simultaneously, so a fix aimed at one problem ("too biased," "too cautious") can swing the model straight into the opposite failure. Deeper changes are slower and more surgical; shallower ones are fast, cheap, and overcorrection-prone. Same ladder, every time.

Whose judgment are you trusting?

Step back and there's a bigger point hiding in all of this. Unless a model is open-source, you don't control how it was built, you rely on the company that made it. Its data, its values, its guardrails were all decisions made by people you'll never meet. That means choosing a model is, partly, choosing whose judgment you trust. Reasonable people land in different places, and the healthy stance is to stay a little skeptical of all of them rather than assume any lab has it perfectly right.

These questions get sharper as models get more capable, and they stop being abstract. Should a model like this help teach kids, or assist in healthcare? Those are design and deployment choices as much as policy ones. We keep that thread light here (the deeper ethics live in Working with AI), but it's worth seeing that it grows straight out of the mechanics on this page.

A current, concrete example

In 2026 Anthropic showed this ladder in action. Through a program called Project Glasswing, critical-infrastructure partners (including Apple, Microsoft, AWS, Google, and NVIDIA, among others) used a model so capable at finding software flaws that it surfaced thousands of serious vulnerabilities, including a 27-year-old bug in OpenBSD and a 16-year-old one in FFmpeg. A model that good at finding holes is dangerous in the wrong hands, so when Anthropic released a public version, it built in guardrails that automatically route requests touching cybersecurity, biology, chemistry, or model-copying to a different, more restricted model instead of answering directly. That's the intervention ladder as a product decision: same underlying capability, different guardrails, depending on who's asking and for what.

A side note you'll hear about: distillation

One more term, kept short because it dates fast. Distillation is training a small, cheap "student" model to imitate a big, expensive "teacher" model's outputs, skipping the costly from-scratch training (the original idea, 2015). It works surprisingly well, which is why labs ban using their models' output to build competitors. When the Chinese model DeepSeek released a strong, cheap model in January 2025, it triggered a market panic (NVIDIA lost roughly $600 billion in value in a single day) over fears that frontier models might not be so expensive to match after all. The bigger takeaway for us: raw model capability is slowly becoming a commodity, which pushes the real value down the stack, toward data, trust, and integration, exactly where Salesforce competes.

📝 Practice

Next time you read about an AI "behaving badly," try to place the cause on the ladder: was it the data it learned from, the values trained into it, a guardrail that fired (or didn't), or a system-prompt instruction pulled too bluntly? You won't always be able to tell from the outside, but asking the question is what separates understanding the technology from just reacting to the headline.

On this page