
⬅️ Previous - When to Fine-Tune or Use RAG
➡️ Next - Choosing the Right LLM
You’ve decided to fine-tune an LLM for your project — great. But before running a single training command, there’s one more set of choices that will define your entire workflow.
These aren’t minor setup details — they determine how much control, transparency, and scalability you’ll have down the line.
This lesson walks you through the three key decision layers in fine-tuning:
1️⃣ model access,
2️⃣ compute environment, and
3️⃣ orchestration approach.
We’ll cover how each choice affects your workflow and results — and in the next lesson, you’ll learn how to choose the actual base model to fine-tune using benchmarks and leaderboards.
The first question is which kind of model you’re fine-tuning.
Frontier models like GPT-4, Claude, and Gemini are fine-tuned through their providers’ APIs.
You upload a dataset (usually in JSONL format), and the provider retrains the model behind the scenes.
You never see or handle the model weights.
It’s simple and scalable — you send your data, get a fine-tuned endpoint, and start making API calls.
But there are trade-offs:
We’ll look at how to fine-tune frontier models directly through APIs in Week 3, when we cover managed services like OpenAI Fine-Tuning, Gemini Studio, and Anthropic Console.
Open-weight models — LLaMA 3, Mistral, Phi-3, Qwen, DeepSeek — are downloadable and customizable.
You can train them locally or on rented GPUs, modify architectures, and evaluate results however you like.
This freedom brings responsibility: managing compute, tracking experiments, and ensuring reproducibility.
But it also enables transparency and independence, which is why this certification program focuses on open-weight fine-tuning throughout Modules 1 and 2.
Once you’ve chosen the model type, the next decision is where your training runs.
The same code works across environments — you just decide whether to execute it on your own hardware or rent GPUs elsewhere.
Running locally means training on your own workstation or internal servers.
It’s ideal for experimentation and small models:
You’ll explore this setup hands-on in Week 2, when we use Google Colab to fine-tune open-weight models interactively.
When your local GPU can’t keep up, you move to the cloud.
Platforms like AWS EC2, RunPod, Vast.ai, Paperspace, and Google Colab Pro+ provide scalable GPU resources on demand.
Here’s the key:
You’re still running the same Hugging Face training scripts — just on remote hardware.
This flexibility lets you start small locally and scale up seamlessly.
Throughout this program and especially in Module 2, you’ll learn how to leverage managed cloud SDKs like AWS SageMaker for distributed training, monitoring, and deployment at scale.
Even after you’ve chosen your model and compute setup, there’s still the question of how you orchestrate fine-tuning.
Here, you work directly with foundational libraries like
Transformers, Datasets, PEFT, TRL, and Accelerate.
You control every aspect — from LoRA parameters and learning rates to checkpointing and evaluation logic.
Example:
from transformers import AutoTokenizer, AutoModelForCausalLM from trl import SFTTrainer from peft import LoraConfig model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf") tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf") lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], lora_dropout=0.1, ) trainer = SFTTrainer( model=model, train_dataset=my_dataset, peft_config=lora_config, max_seq_length=2048, )
This path is ideal for research and experimentation, when you want fine-grained control or need to integrate new techniques the moment they appear — for instance, QLoRA or adapter fusion, which you’ll explore in Week 3.
Managed frameworks abstract orchestration so you can focus on data and configuration.
You define a YAML file or simple SDK call; the platform manages setup, scaling, and logging.
Popular options include Axolotl, AWS SageMaker, and Together.ai.
Example:
# Axolotl configuration base_model: meta-llama/Llama-2-7b-hf datasets: - path: ./support_data.jsonl type: completion adapter: lora lora_r: 32 lora_alpha: 16 sequence_len: 2048 micro_batch_size: 2 num_epochs: 3
This approach is perfect for enterprise or production workflows — when you need consistent results and automated scaling.
You’ll gain hands-on experience with Axolotl in Week 3 and Bedrock in Week 5, and learn to evaluate and optimize these models in Week 6.
Each decision — model access, compute setup, and orchestration — shapes your workflow, costs, and long-term flexibility.
| If you value… | Frontier (API) | Open-Weight | Local | Cloud | Custom Code | Managed Framework |
|---|---|---|---|---|---|---|
| Simplicity & Speed | ✅ | ✅ | ✅ | ✅ | ||
| Transparency & Control | ✅ | ✅ | ✅ | ✅ | ||
| Cost Efficiency | ✅ | ✅ | ✅ | |||
| Scale & Performance | ✅ | ✅ | ✅ | ✅ | ✅ | |
| Flexibility & Experimentation | ✅ | ✅ | ✅ | ✅ | ||
| Reliability & Automation | ✅ | ✅ | ✅ | |||
| Reproducibility & Auditing | ✅ | ✅ | ✅ | ✅ |
Many teams use hybrid approaches — experimenting locally, scaling on cloud GPUs, and deploying through managed services.
You now understand the three layers of fine-tuning decisions —
what kind of model you can access, where it runs, and how it’s orchestrated.
In the next lesson, you’ll learn how to choose your base model wisely using LLM benchmarks and leaderboards — a skill that becomes critical once you start evaluating fine-tuned performance in Week 6.
You’ve laid the groundwork for your fine-tuning journey.
Next comes selecting the right foundation to build on.
⬅️ Previous - When to Fine-Tune or Use RAG
➡️ Next - Choosing the Right LLM