The AI Stack Explained: LLM Talks, Program Walks

A first-principles breakdown of the entire AI stack — from LLM to Agent in one mental model. An LLM can only output text. Everything else is the program.

March 28, 2026
Harrison Guo
9 min read
AI Engineering System Architecture

LLM, Token, Agent — They're All the Same Thing. (AI Stack Explained)

Watch the full 15-minute video walkthrough with animations.

LLM. Token. Context. Prompt. Function Calling. MCP. Agent. Skill.

You’ve spent months trying to understand these concepts. Here’s something that might surprise you: they’re all the same thing.

An LLM can only do one thing — output text. It can’t browse the web. It can’t query a database. It can’t control your computer. The program around it does all of that. The program reads the text the LLM outputs, takes action on its behalf, and feeds the result back.

LLM talks, program walks. That’s the entire AI stack in four words.

tl;dr — Every AI capability — from chatbots to autonomous agents — is built on one loop: the LLM outputs text, the program reads it and acts, the result feeds back. Understanding this loop makes every AI concept transparent.


Layer 1: The LLM — A Genius That Can Only Play Word Chain

At its core, a large language model is a word prediction machine.

You give it “The capital of France is” — it predicts “Paris.” Then it appends “Paris” to the input and predicts again. Comma. “Which.” “Is.” On and on — until it outputs a stop token.

The LLM predicts one word at a time: predict → append → predict again.

No thinking. No understanding. No consciousness. Just one thing: given the text so far, predict the next word.

But the model’s internals are pure matrix math — it only understands numbers. So there’s a translator: the Tokenizer. It chops text into small chunks called Tokens, maps each to a number, feeds them to the model, and converts the output back to text.

A Token ≠ a word. “helpful” → “help” + “ful” (2 tokens). “unbelievable” → “un” + “believ” + “able” (3 tokens).

Tokens are the atoms of the LLM world. Everything goes in as tokens, everything comes out as tokens.

The LLM can play word chain. But it has a fatal flaw.


Layer 2: Context — A Genius with No Memory

The LLM has no memory. This isn’t a metaphor — it’s literally a math function. Input in, output out, done. Next call? Knows nothing.

So why does it seem like it remembers your earlier messages?

Because every time you send a message, the program behind the scenes stitches your entire conversation history together and sends it all at once. The LLM doesn’t “remember.” It re-reads everything from scratch. Every single time.

Context = everything placed on the LLM's desk: chat history, system instructions, your question, tool list.

This bundle is called Context — everything the LLM can see at once. Think of it as a desk. Today’s models fit about 1 million tokens on that desk (~750,000 words, roughly all seven Harry Potter books).

But even with a big desk, dumping a thousand-page manual is impractical. The fix? Only put the relevant pages on the desk. Search ahead of time, find matching chunks, feed only those.

That’s RAG — Retrieval-Augmented Generation. Don’t dump everything. Pick what matters.


Layer 3: Prompt — What You Say to the LLM

Don’t overthink “Prompt.” A prompt is just what you say to the LLM. Every message you type is a prompt.

But there are two kinds:

User Prompt (what you type) and System Prompt (rules the developer sets) both go into Context.

User Prompt — what you type. “Write me a sorting algorithm in Python.”

System Prompt — rules the developer sets behind the scenes. “You are a senior Python engineer. Keep answers concise.” You never see this, but the LLM reads it every time.

Both get packed into Context. User Prompt = what to do now. System Prompt = who you are and what rules to follow.

The LLM can now predict words, see history, and follow instructions. But it’s still just outputting text.

What comes next is the most important part.


Layer 4: Function Calling — Where Everything Begins

Let’s come back to the fundamental fact:

An LLM can only output text. It can’t browse the internet. It can’t check the weather. It can’t call any API.

So how does it “check the weather”? It doesn’t. The program does.

sequenceDiagram
    participant You
    participant Program
    participant LLM
    participant API as Weather API

    You->>Program: "What's the weather in Tokyo?"
    Program->>LLM: [your question + tool catalog]
    LLM->>Program: {"tool": "get_weather", "args": {"city": "Tokyo"}}
    Note over LLM: LLM's job is done. It just output JSON text.
    Program->>API: GET /weather?city=Tokyo
    API->>Program: {"condition": "Cloudy", "temp": "18°C"}
    Program->>LLM: [original question + tool result]
    LLM->>Program: "It's currently cloudy in Tokyo, around 18°C."
    Program->>You: "It's currently cloudy in Tokyo, around 18°C."

The LLM outputs JSON text. The program parses it, calls the API, and feeds the result back.

The LLM did not call anything. It just output a JSON string. The program parsed that JSON, the program called the API, the program got the result, and the program fed it back.

That’s all Function Calling is.

I’ll sum it up in four words: LLM talks, program walks.

The LLM only talks — “I want to check the weather.” The program walks — it actually goes and checks. Everything that comes next is built on this loop.


Layer 5: MCP — The Tool Catalog

We’ve got “LLM talks, program walks.” But there’s a practical problem: how does the program know what tools are available?

Imagine you’re a new employee with dozens of internal systems. Nobody gives you a tool directory. MCP is that directory — in a standard format.

MCP Server provides two things: a tool catalog and execution endpoint.

An MCP Server provides two things:

  1. Catalog — “What tools do you have?” → returns each tool’s name, description, parameters, and return format
  2. Execution — “Call get_weather with Tokyo” → runs it, returns the result

Before MCP, every platform had its own way of connecting tools. Build for ChatGPT, rewrite for Claude, rewrite for Gemini. Same tool, three times.

MCP unified this: build once, run everywhere. Think USB-C — one cable works for everything.


Layer 6: Agent — The “Talks & Walks” Loop, on Repeat

In Function Calling, the LLM talked once and the program walked once. One round trip. But real problems aren’t that simple:

“What’s the weather here? If it’s raining, find me a nearby umbrella shop.”

That’s multiple steps:

The agent loop: LLM talks → program walks → feed back → repeat until done.

  1. LLM says “I need the location tool” → program executes → returns coordinates
  2. LLM says “Check weather at these coordinates” → program executes → returns “rainy”
  3. LLM says “Search nearby umbrella shops” → program executes → returns results
  4. LLM combines everything → outputs the final answer

Every step is the same loop: talks → walks → feedback → talks again → walks again.

A system that can plan autonomously, execute across multiple steps, and loop until completion — that’s an Agent.

Claude Code, Cursor, and GitHub Copilot all call themselves agents. Under the hood, they’re running this same loop.

But here’s the key insight: getting the location, checking the weather, searching for shops — the program does all of that. None of it requires intelligence. The LLM’s only job? Deciding what to do next.

An “intelligent agent” is actually assembled from parts that require zero intelligence.


Layer 7: Skill — Pre-Written Rules

The agent can plan on its own. But it doesn’t know your rules.

Your team has a deployment checklist — pass all tests, verify env variables, confirm rollback plan, notify on-call. You want the agent to follow this every time. Are you going to type all that out every deploy?

A Skill is those rules written into a document, stored in a fixed location. It’s literally a Markdown file — name, description, steps, rules, format, examples.

Let’s be honest: a Skill is just a prompt that lives in a different place and has a fancier name. But Skills have one clever design — progressive disclosure:

Level 1: scan catalog → Level 2: load instructions → Level 3: follow citations. Load only what's needed.

  • Level 1: Scan names and descriptions (table of contents)
  • Level 2: Load full instructions when matched (open the chapter)
  • Level 3: Load referenced docs/scripts only when needed (check the footnotes)

It’s a tradeoff between token cost and information completeness. Just enough is optimal.


The Big Picture

Let’s zoom out:

All 7 layers building up — from LLM at the base to Skill at the top.

graph TD
    S[Skill] -->|pre-written rules| A
    A[Agent] -->|loop on repeat| M
    M[MCP] -->|tool catalog| FC
    FC[Function Calling] -->|text → action| P
    P[Prompt] -->|instructions| C
    C[Context] -->|everything visible| T
    T[Token] -->|atomic units| L
    L[LLM] -->|outputs text| FC

    style L fill:#333,stroke:#666,color:#fff
    style T fill:#333,stroke:#777,color:#fff
    style C fill:#333,stroke:#888,color:#fff
    style P fill:#333,stroke:#999,color:#fff
    style FC fill:#1a3a5c,stroke:#4da6ff,color:#fff
    style M fill:#3d2200,stroke:#ff9f43,color:#fff
    style A fill:#003333,stroke:#00d2d3,color:#fff
    style S fill:#2d1045,stroke:#9b59b6,color:#fff

From top to bottom:

  • Function Calling — the program turns text into action
  • MCP — provides the tool catalog
  • Agent — lets the loop run multiple rounds
  • Skill — pre-written rules that guide the LLM
  • RAG — picks relevant info for the desk
  • Memory — stitches history back in

None of these capabilities belong to the LLM itself. They’re all granted by external programs.

The LLM’s sole contribution? Outputting the right text at the right time.


Two Questions That Cut Through Any Buzzword

Next time someone throws a new concept at you — Multi-Agent, Agentic RAG, Orchestration Framework — you only need two questions:

① What text did the LLM output?

② Who read that text and turned it into an actual action?

Answer those two questions, and any concept becomes transparent.

LLM talks, program walks. That loop is how the entire AI world runs.


Try It Yourself

See Function Calling happen live in your terminal:

git clone https://github.com/harrison001/llm-talks-program-walks.git
cd llm-talks-program-walks
pip install openai
export OPENAI_API_KEY=your_key_here
python mouth_speaks_hand_acts.py "What's the weather in Tokyo?"

The terminal labels every step — “This is just TEXT” when the LLM outputs JSON, and “The PROGRAM did this” when the program executes the function. View on GitHub

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.