Can an LLM browse the internet or call APIs on its own?

No. An LLM can only output text. When it appears to "browse the web" or "call an API," the program around it reads the LLM's text output (usually JSON), executes the action, and feeds the result back. This is called Function Calling.

What is the difference between Function Calling and MCP?

Function Calling is the pattern where the LLM outputs structured text and the program executes it. MCP (Model Context Protocol) is a standard that provides a tool catalog and execution endpoint, so tools can be built once and work across ChatGPT, Claude, Gemini, etc. Think of MCP as USB-C for AI tools.

What makes something an "AI agent"?

An agent is a system that runs the "LLM talks, program walks" loop multiple times autonomously. The LLM plans what to do next, the program executes, the result feeds back, and the loop repeats until the task is complete. Claude Code, Cursor, and GitHub Copilot are all agents running this same loop.

How is RAG different from just giving the LLM all the data?

Context windows have size limits. RAG (Retrieval-Augmented Generation) searches your data first, finds only the relevant chunks, and places those on the LLM's "desk." It's like only putting relevant pages of a manual on your desk instead of the whole book.

What is a Skill in AI systems?

A Skill is just a prompt stored in a file — a set of pre-written rules the agent can consult on demand. Skills use progressive disclosure to load only what's needed, balancing token cost against information completeness.

The AI Stack Explained: LLM Talks, Program Walks

Q: How do I evaluate any new AI concept or buzzword?

Ask two questions. One, what text did the LLM output? Two, who read that text and turned it into an actual action? Answer those and any concept becomes transparent.

A first-principles breakdown of the entire AI stack — from LLM to Agent in one mental model. An LLM can only output text. Everything else is the program.

March 28, 2026

Harrison Guo

9 min read

AI Engineering System Architecture

LLM, Token, Agent — They're All the Same Thing. (AI Stack Explained)

Watch the full 15-minute video walkthrough with animations.

LLM. Token. Context. Prompt. Function Calling. MCP. Agent. Skill.

You’ve spent months trying to understand these concepts. Here’s something that might surprise you: they’re all the same thing.

An LLM can only do one thing — output text. It can’t browse the web. It can’t query a database. It can’t control your computer. The program around it does all of that. The program reads the text the LLM outputs, takes action on its behalf, and feeds the result back.

LLM talks, program walks. That’s the entire AI stack in four words.

tl;dr — Every AI capability — from chatbots to autonomous agents — is built on one loop: the LLM outputs text, the program reads it and acts, the result feeds back. Understanding this loop makes every AI concept transparent.

Layer 1: The LLM — A Genius That Can Only Play Word Chain

At its core, a large language model is a word prediction machine.

You give it “The capital of France is” — it predicts “Paris.” Then it appends “Paris” to the input and predicts again. Comma. “Which.” “Is.” On and on — until it outputs a stop token.

The LLM predicts one word at a time: predict → append → predict again.

No thinking. No understanding. No consciousness. Just one thing: given the text so far, predict the next word.

But the model’s internals are pure matrix math — it only understands numbers. So there’s a translator: the Tokenizer. It chops text into small chunks called Tokens, maps each to a number, feeds them to the model, and converts the output back to text.

A Token ≠ a word. “helpful” → “help” + “ful” (2 tokens). “unbelievable” → “un” + “believ” + “able” (3 tokens).

Tokens are the atoms of the LLM world. Everything goes in as tokens, everything comes out as tokens.

The LLM can play word chain. But it has a fatal flaw.

Layer 2: Context — A Genius with No Memory

The LLM has no memory. This isn’t a metaphor — it’s literally a math function. Input in, output out, done. Next call? Knows nothing.

So why does it seem like it remembers your earlier messages?

Because every time you send a message, the program behind the scenes stitches your entire conversation history together and sends it all at once. The LLM doesn’t “remember.” It re-reads everything from scratch. Every single time.

Context = everything placed on the LLM's desk: chat history, system instructions, your question, tool list.

This bundle is called Context — everything the LLM can see at once. Think of it as a desk. Today’s models fit about 1 million tokens on that desk (~750,000 words, roughly all seven Harry Potter books).

But even with a big desk, dumping a thousand-page manual is impractical. The fix? Only put the relevant pages on the desk. Search ahead of time, find matching chunks, feed only those.

That’s RAG — Retrieval-Augmented Generation. Don’t dump everything. Pick what matters.

Layer 3: Prompt — What You Say to the LLM

Don’t overthink “Prompt.” A prompt is just what you say to the LLM. Every message you type is a prompt.

But there are two kinds:

User Prompt (what you type) and System Prompt (rules the developer sets) both go into Context.

User Prompt — what you type. “Write me a sorting algorithm in Python.”

System Prompt — rules the developer sets behind the scenes. “You are a senior Python engineer. Keep answers concise.” You never see this, but the LLM reads it every time.

Both get packed into Context. User Prompt = what to do now. System Prompt = who you are and what rules to follow.

The LLM can now predict words, see history, and follow instructions. But it’s still just outputting text.

What comes next is the most important part.

Layer 4: Function Calling — Where Everything Begins

Let’s come back to the fundamental fact:

An LLM can only output text. It can’t browse the internet. It can’t check the weather. It can’t call any API.

So how does it “check the weather”? It doesn’t. The program does.

sequenceDiagram
    participant You
    participant Program
    participant LLM
    participant API as Weather API

    You->>Program: "What's the weather in Tokyo?"
    Program->>LLM: [your question + tool catalog]
    LLM->>Program: {"tool": "get_weather", "args": {"city": "Tokyo"}}
    Note over LLM: LLM's job is done. It just output JSON text.
    Program->>API: GET /weather?city=Tokyo
    API->>Program: {"condition": "Cloudy", "temp": "18°C"}
    Program->>LLM: [original question + tool result]
    LLM->>Program: "It's currently cloudy in Tokyo, around 18°C."
    Program->>You: "It's currently cloudy in Tokyo, around 18°C."

The LLM outputs JSON text. The program parses it, calls the API, and feeds the result back.

The LLM did not call anything. It just output a JSON string. The program parsed that JSON, the program called the API, the program got the result, and the program fed it back.

That’s all Function Calling is.

I’ll sum it up in four words: LLM talks, program walks.

The LLM only talks — “I want to check the weather.” The program walks — it actually goes and checks. Everything that comes next is built on this loop.

Layer 5: MCP — The Tool Catalog

We’ve got “LLM talks, program walks.” But there’s a practical problem: how does the program know what tools are available?

Imagine you’re a new employee with dozens of internal systems. Nobody gives you a tool directory. MCP is that directory — in a standard format.

MCP Server provides two things: a tool catalog and execution endpoint.

An MCP Server provides two things:

Catalog — “What tools do you have?” → returns each tool’s name, description, parameters, and return format
Execution — “Call get_weather with Tokyo” → runs it, returns the result

Before MCP, every platform had its own way of connecting tools. Build for ChatGPT, rewrite for Claude, rewrite for Gemini. Same tool, three times.

MCP unified this: build once, run everywhere. Think USB-C — one cable works for everything.

Layer 6: Agent — The “Talks & Walks” Loop, on Repeat

In Function Calling, the LLM talked once and the program walked once. One round trip. But real problems aren’t that simple:

“What’s the weather here? If it’s raining, find me a nearby umbrella shop.”

That’s multiple steps:

The agent loop: LLM talks → program walks → feed back → repeat until done.

LLM says “I need the location tool” → program executes → returns coordinates
LLM says “Check weather at these coordinates” → program executes → returns “rainy”
LLM says “Search nearby umbrella shops” → program executes → returns results
LLM combines everything → outputs the final answer

Every step is the same loop: talks → walks → feedback → talks again → walks again.

A system that can plan autonomously, execute across multiple steps, and loop until completion — that’s an Agent.

Claude Code, Cursor, and GitHub Copilot all call themselves agents. Under the hood, they’re running this same loop.

But here’s the key insight: getting the location, checking the weather, searching for shops — the program does all of that. None of it requires intelligence. The LLM’s only job? Deciding what to do next.

An “intelligent agent” is actually assembled from parts that require zero intelligence.

Layer 7: Skill — Pre-Written Rules

The agent can plan on its own. But it doesn’t know your rules.

Your team has a deployment checklist — pass all tests, verify env variables, confirm rollback plan, notify on-call. You want the agent to follow this every time. Are you going to type all that out every deploy?

A Skill is those rules written into a document, stored in a fixed location. It’s literally a Markdown file — name, description, steps, rules, format, examples.

Let’s be honest: a Skill is just a prompt that lives in a different place and has a fancier name. But Skills have one clever design — progressive disclosure:

Level 1: scan catalog → Level 2: load instructions → Level 3: follow citations. Load only what's needed.

Level 1: Scan names and descriptions (table of contents)
Level 2: Load full instructions when matched (open the chapter)
Level 3: Load referenced docs/scripts only when needed (check the footnotes)

It’s a tradeoff between token cost and information completeness. Just enough is optimal.

The Big Picture

Let’s zoom out:

All 7 layers building up — from LLM at the base to Skill at the top.

graph TD
    S[Skill] -->|pre-written rules| A
    A[Agent] -->|loop on repeat| M
    M[MCP] -->|tool catalog| FC
    FC[Function Calling] -->|text → action| P
    P[Prompt] -->|instructions| C
    C[Context] -->|everything visible| T
    T[Token] -->|atomic units| L
    L[LLM] -->|outputs text| FC

    style L fill:#333,stroke:#666,color:#fff
    style T fill:#333,stroke:#777,color:#fff
    style C fill:#333,stroke:#888,color:#fff
    style P fill:#333,stroke:#999,color:#fff
    style FC fill:#1a3a5c,stroke:#4da6ff,color:#fff
    style M fill:#3d2200,stroke:#ff9f43,color:#fff
    style A fill:#003333,stroke:#00d2d3,color:#fff
    style S fill:#2d1045,stroke:#9b59b6,color:#fff

From top to bottom:

Function Calling — the program turns text into action
MCP — provides the tool catalog
Agent — lets the loop run multiple rounds
Skill — pre-written rules that guide the LLM
RAG — picks relevant info for the desk
Memory — stitches history back in

None of these capabilities belong to the LLM itself. They’re all granted by external programs.

The LLM’s sole contribution? Outputting the right text at the right time.

Two Questions That Cut Through Any Buzzword

Next time someone throws a new concept at you — Multi-Agent, Agentic RAG, Orchestration Framework — you only need two questions:

① What text did the LLM output?

② Who read that text and turned it into an actual action?

Answer those two questions, and any concept becomes transparent.

LLM talks, program walks. That loop is how the entire AI world runs.

Try It Yourself

See Function Calling happen live in your terminal:

git clone https://github.com/harrison001/llm-talks-program-walks.git
cd llm-talks-program-walks
pip install openai
export OPENAI_API_KEY=your_key_here
python mouth_speaks_hand_acts.py "What's the weather in Tokyo?"

The terminal labels every step — “This is just TEXT” when the LLM outputs JSON, and “The PROGRAM did this” when the program executes the function. View on GitHub

Tags: ai llm function-calling mcp agent rag ai-architecture first-principles

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.

The AI Stack Explained: LLM Talks, Program Walks

A first-principles breakdown of the entire AI stack — from LLM to Agent in one mental model. An LLM can only output text. Everything else is the program.

Table of Contents

LLM, Token, Agent — They're All the Same Thing. (AI Stack Explained)

Layer 1: The LLM — A Genius That Can Only Play Word Chain

Layer 2: Context — A Genius with No Memory

Layer 3: Prompt — What You Say to the LLM

Layer 4: Function Calling — Where Everything Begins

Layer 5: MCP — The Tool Catalog

Layer 6: Agent — The “Talks & Walks” Loop, on Repeat

Layer 7: Skill — Pre-Written Rules

The Big Picture

Two Questions That Cut Through Any Buzzword

Try It Yourself

Comments

Leave a Comment

The AI Stack Explained: LLM Talks, Program Walks

A first-principles breakdown of the entire AI stack — from LLM to Agent in one mental model. An LLM can only output text. Everything else is the program.

Table of Contents

LLM, Token, Agent — They're All the Same Thing. (AI Stack Explained)

Layer 1: The LLM — A Genius That Can Only Play Word Chain

Layer 2: Context — A Genius with No Memory

Layer 3: Prompt — What You Say to the LLM

Layer 4: Function Calling — Where Everything Begins

Layer 5: MCP — The Tool Catalog

Layer 6: Agent — The “Talks & Walks” Loop, on Repeat

Layer 7: Skill — Pre-Written Rules

The Big Picture

Two Questions That Cut Through Any Buzzword

Try It Yourself

Related Articles

LLM, Token, Agent — They're All the Same Thing

The AI Stack Explained: LLM Talks, Program Walks

Comments

Leave a Comment

[ Connect_With_Me ]