LLM Fine-Tuning Options: Open-Weight Models, Infrastructure, and Frameworks

⬅️ Previous - When to Fine-Tune or Use RAG

You’ve decided to fine-tune an LLM for your project — great. But before running a single training command, there’s one more set of choices that will define your entire workflow.

Will you fine-tune a frontier model through an API, or an open-weight model you can fully control?
Will you run training locally or in the cloud?
And finally, will you write custom code or use a managed framework to orchestrate the process?

These aren’t minor setup details — they determine how much control, transparency, and scalability you’ll have down the line.

This lesson walks you through the three key decision layers in fine-tuning:
1️⃣ model access,
2️⃣ compute environment, and
3️⃣ orchestration approach.

We’ll cover how each choice affects your workflow and results — and in the next lesson, you’ll learn how to choose the actual base model to fine-tune using benchmarks and leaderboards.

Decision Layer 1: Model Access — Frontier or Open-Weight

The first question is which kind of model you’re fine-tuning.

Frontier Models (API-Based)

Frontier models like GPT-4, Claude, and Gemini are fine-tuned through their providers’ APIs.
You upload a dataset (usually in JSONL format), and the provider retrains the model behind the scenes.
You never see or handle the model weights.

It’s simple and scalable — you send your data, get a fine-tuned endpoint, and start making API calls.

But there are trade-offs:

You can’t inspect or modify the model architecture.
You can’t reproduce or audit the fine-tuning process.
You pay per request, often at higher rates.

We’ll look at how to fine-tune frontier models directly through APIs in Week 3, when we cover managed services like OpenAI Fine-Tuning, Gemini Studio, and Anthropic Console.

Open-Weight Models (Full Control)

Open-weight models — LLaMA 3, Mistral, Phi-3, Qwen, DeepSeek — are downloadable and customizable.
You can train them locally or on rented GPUs, modify architectures, and evaluate results however you like.

This freedom brings responsibility: managing compute, tracking experiments, and ensuring reproducibility.
But it also enables transparency and independence, which is why this certification program focuses on open-weight fine-tuning throughout Modules 1 and 2.

Decision Layer 2: Compute Environment — Local or Cloud

Once you’ve chosen the model type, the next decision is where your training runs.

The same code works across environments — you just decide whether to execute it on your own hardware or rent GPUs elsewhere.

Local Training

Running locally means training on your own workstation or internal servers.
It’s ideal for experimentation and small models:

✅ Full control and privacy
✅ Quick iteration cycles
❌ Limited GPU capacity

You’ll explore this setup hands-on in Week 2, when we use Google Colab to fine-tune open-weight models interactively.

Cloud Training

When your local GPU can’t keep up, you move to the cloud.
Platforms like AWS EC2, RunPod, Vast.ai, Paperspace, and Google Colab Pro+ provide scalable GPU resources on demand.

Here’s the key:
You’re still running the same Hugging Face training scripts — just on remote hardware.

This flexibility lets you start small locally and scale up seamlessly.

Throughout this program and especially in Module 2, you’ll learn how to leverage managed cloud SDKs like AWS SageMaker for distributed training, monitoring, and deployment at scale.

Decision Layer 3: Orchestration — Custom Code or Managed Framework

Even after you’ve chosen your model and compute setup, there’s still the question of how you orchestrate fine-tuning.

Custom Code Approach

Here, you work directly with foundational libraries like
Transformers, Datasets, PEFT, TRL, and Accelerate.

You control every aspect — from LoRA parameters and learning rates to checkpointing and evaluation logic.

Example:

from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import SFTTrainer
from peft import LoraConfig

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
)

trainer = SFTTrainer(
    model=model,
    train_dataset=my_dataset,
    peft_config=lora_config,
    max_seq_length=2048,
)

This path is ideal for research and experimentation, when you want fine-grained control or need to integrate new techniques the moment they appear — for instance, QLoRA or adapter fusion, which you’ll explore in Week 3.

Managed Framework Approach

Managed frameworks abstract orchestration so you can focus on data and configuration.
You define a YAML file or simple SDK call; the platform manages setup, scaling, and logging.

Popular options include Axolotl, AWS SageMaker, and Together.ai.

Example:

# Axolotl configuration
base_model: meta-llama/Llama-2-7b-hf
datasets:
  - path: ./support_data.jsonl
    type: completion

adapter: lora
lora_r: 32
lora_alpha: 16
sequence_len: 2048
micro_batch_size: 2
num_epochs: 3

This approach is perfect for enterprise or production workflows — when you need consistent results and automated scaling.
You’ll gain hands-on experience with Axolotl in Week 3 and Bedrock in Week 5, and learn to evaluate and optimize these models in Week 6.

Choosing What Fits Your Workflow

Each decision — model access, compute setup, and orchestration — shapes your workflow, costs, and long-term flexibility.

If you value…	Frontier (API)	Open-Weight	Local	Cloud	Custom Code	Managed Framework
Simplicity & Speed	✅		✅	✅		✅
Transparency & Control		✅	✅	✅	✅
Cost Efficiency		✅	✅		✅
Scale & Performance	✅	✅		✅	✅	✅
Flexibility & Experimentation		✅	✅	✅	✅
Reliability & Automation	✅			✅		✅
Reproducibility & Auditing		✅	✅	✅	✅

Many teams use hybrid approaches — experimenting locally, scaling on cloud GPUs, and deploying through managed services.

Your Next Step

You now understand the three layers of fine-tuning decisions —
what kind of model you can access, where it runs, and how it’s orchestrated.

In the next lesson, you’ll learn how to choose your base model wisely using LLM benchmarks and leaderboards — a skill that becomes critical once you start evaluating fine-tuned performance in Week 6.

You’ve laid the groundwork for your fine-tuning journey.
Next comes selecting the right foundation to build on.

🏠 Home - All Lessons