
⬅️ Previous - Language Model Architectures
➡️ Next - When to Finetune or Use RAG
In the previous lesson, you learned that modern language models share one core design — the decoder-only architecture. Now comes the next question: Where do these models actually come from, and which ones should you work with?
The LLM world divides into two major ecosystems: frontier models you access through APIs (like GPT-4 and Claude), and open-weight models you can download and run yourself (like LLaMA and Mistral). Each offers different trade-offs for control, cost, and privacy.
Within those ecosystems, models appear in different training variants — from raw base models and instruction-tuned assistants to reasoning-optimized and domain-specialized versions. Each variant exists for a reason, serving a different role in the LLM development and deployment pipeline.
This lesson gives you a clear map of that landscape — what each variant does, how the ecosystems differ, and where our program focuses as you learn to fine-tune open-weight, decoder-only models built for real-world use.
Every LLM you’ll ever work with lives in one of two ecosystems.
Not technical categories, but philosophies! Two very different ways the AI community builds, releases, and shares its models.
Let’s start with a simple thought experiment.
Imagine you need a car tomorrow. You have two choices:
That’s the difference between frontier models and open-weight models.
In this video, we break down the key differences between frontier LLMs (API access) and open-weight LLMs (download and fine-tune). We’ll walk you through practical examples to help you understand how these models work, their trade-offs, and the best use cases for each.
Frontier models are the ones that make headlines—GPT-4, Claude 3, Gemini, Grok.
They’re massive, closed-weight systems trained by companies with staggering compute budgets and huge proprietary datasets.
You never download these models. You send them requests through an API, and they send back answers from the cloud.
You get instant access to the best performance available, but you trade away control.
Frontier models are ideal when you need to move fast—build a prototype, validate an idea, or serve users without managing infrastructure.
The provider handles scaling, optimization, and safety updates. You just plug into the endpoint and start building.
But that convenience comes with three predictable constraints:
For most people starting out, frontier models feel like magic: high performance, low setup.
But when projects mature—when data privacy, cost, or reproducibility become priorities—teams often look toward open-weight alternatives.
Open-weight models are the downloadable ones—the ones you can actually own.
You can host them locally, fine-tune them for your needs, and even merge or quantize them for efficiency.
Think of names like LLaMA 3, Mistral 7B, Mixtral, Phi-3, Qwen, or DeepSeek.
They form a vibrant ecosystem where progress moves quickly and collaboration happens in the open.
Here’s why developers and researchers love them:
Of course, open-weight models bring new responsibilities: infrastructure setup, GPU management, monitoring, and optimization.
But for teams that want independence and reproducibility, they’re the natural choice.
That’s why, in this certification, we focus primarily on open-weight, decoder-only models—they’re the backbone of modern applied LLM engineering.
The term "frontier" is often used to describe proprietary, closed-source models like those from OpenAI and Claude. These models have traditionally been seen as the state-of-the-art, coming from organizations leading the way in generative AI.
However, the term "frontier" can be a bit misleading. While open-weight models like LLaMA and Mistral used to lag behind, the performance gap is steadily closing. In fact, open-weight models are quickly approaching the capabilities of frontier models.
We continue using the "frontier" vs open-weight labels in this lesson because it's widely used, but it’s important to note that the distinction between them is becoming less clear, and open-weight models may soon represent the true frontier in AI.
Now that you know where models live—frontier or open—let’s talk about how far along they are in their training.
Not all LLMs are built for the same purpose.
Some are blank slates ready for customization; others are polished assistants fine-tuned to follow instructions or reason step-by-step.
It’s helpful to think of these as variants, not steps.
They’re parallel categories in a growing ecosystem, each serving a purpose.
A base model is the LLM in its purest form.
It’s been pretrained on trillions of tokens to learn general language patterns, but it hasn’t been taught how to follow instructions.
Ask it a question, and it will try to continue your text—not necessarily answer you.
User: Explain quantum computing. Base Model: Explain quantum computing. How is it different from a classical computer? Quantum computing is a new type ...
It mimics phrasing, repeats the question, asks its own questions, or plain drifts!
That’s not a bug. It’s behaving exactly as trained: predicting the next word.
When to use base models:
When you want to apply your own instruction tuning or inject specialized domain data from scratch.
Researchers often start here; most practitioners don’t need to.
These are the models most of us use every day.
They start as base models, then undergo instruction tuning—training on thousands of question–answer examples—and sometimes reinforcement learning from human feedback (RLHF).
Now, when you ask something, they don’t just complete your text—they respond to you.
User: Explain quantum computing. Instruct Model: Quantum computing uses quantum bits, or qubits, which can represent multiple states...
Instruct models are ideal starting points for fine-tuning because they already understand conversational intent.
When you fine-tune one, you’re layering domain expertise on top of an already aligned, well-behaved assistant.
That’s exactly what you’ll be doing in this program.
A newer variant focuses on structured reasoning.
Instead of jumping to an answer, these models “show their work” before responding.
They’re trained to generate intermediate reasoning steps—sometimes called chain-of-thought—and can tackle complex tasks in math, code, or logic that stump standard chat models.
User: A train leaves the station at 3 PM going 60 km/h. Another train leaves from the same station at 4 PM going 90 km/h in the same direction. At what time will the faster train catch up? Reasoning Model: <thinking> In one hour head start, the first train travels 60 km. The speed difference is 90 - 60 = 30 km/h. To close a 60 km gap at 30 km/h takes 2 hours. So the faster train catches up 2 hours after 4 PM — that’s 6 PM. </thinking> Answer: 6 PM.
They’re slower and more expensive to run, but much better at analytical accuracy.
For most business applications, instruct models are enough. But when planning and precision is required, reasoning variants shine.
Finally, there are fine-tuned models—LLMs trained on specialized corpora to make them experts in a specific domain: legal documents, clinical notes, financial reports, or programming languages.
These models blend general language fluency with deep subject knowledge.
Think of them as experts who speak both everyday English and the dialect of their field.
User: Draft a clause limiting liability for indirect damages. Legal Model: The parties agree that neither shall be liable for indirect, incidental, or consequential losses...
They outperform general models on their niche but may lose some general-purpose flexibility.
You’ll encounter or even create these in your own fine-tuning projects later in the program.
By now, you’ve seen the full landscape:
two ecosystems—frontier and open-weight—and four functional variants—base, instruct, reasoning, and domain.
In this certification, we’ll work mainly with open-weight, instruct-tuned, decoder-only models such as LLaMA 3, Mistral, and Phi-3.
These strike the ideal balance between accessibility, control, and production realism.
You’ll experiment with frontier APIs (like GPT-4) for quick benchmarking, but your core learning will happen hands-on—with models you can fine-tune, evaluate, and deploy yourself.
That’s the real skill employers value: turning an open-weight model into something useful, reliable, and reproducible.
Now that you understand the LLM landscape—how models are released, specialized, and evolved—you’re ready to tackle the next big question:
When should you actually fine-tune a model?
Not every LLM application needs fine-tuning. In fact, many problems can be solved effectively through prompting or retrieval-based approaches.
In the next lesson, we’ll explore why and when fine-tuning makes sense, what alternatives exist, and how to choose the right path for your specific use case.
That’s where we start turning theory into real design decisions.
⬅️ Previous - Language Model Architectures
➡️ Next - When to Finetune or Use RAG