LlamaVox – Fine‑Tuning Llama 3.1 (8 B) on Budget Hardware
Abstract
I demonstrate that Meta Llama 3.1 (8 B parameters) can be fine‑tuned on modest, shared‑cluster hardware by combining LoRA adapters, mixed‑precision training, and aggressive cache management.
Using a single H200 or A100 GPU with 16–40 GB of effective memory and a 4‑hour job window, I cut mini‑dataset training loss from 0.116 to 0.023 and reached 98.98 % token‑level accuracy in 18 minutes while storing adapters as small as 81 MB. The open‑source LlamaVox toolkit lowers the cost barrier for researchers and hobbyists who lack multi‑GPU servers.
Key Findings
- LoRA shrinks trainable parameters ≈ 250 ×, yielding adapters < 0.2 GB and slashing VRAM needs.
- 8‑bit base‑model loading + fp16 training enables single‑GPU fine‑tuning without convergence loss.
- H200 gives a ~1.4 × speed‑up over A100, but both finish a 50 K‑sample run inside an academic 4‑hour job.
- Loss plateaus after 2–3 epochs on Mini and 1 epoch on Medium, signalling rapid convergence under LoRA.
Methodology
Step | Details |
---|
Model + Adapter | Base: Llama 3.1 8B; LoRA ranks 8 & 16 → 65 K–131 K trainable params (81–161 MB on disk). |
Datasets | 5 K "Mini" (2.2 MB), 50 K "Medium", plus 1 K synthetic dialogue samples. |
Hardware | Northeastern U. Discovery cluster; single H200 (141 GB), A100 (40 GB) or V100 (32 GB); 4 h SLURM limit. |
Training stack | PyTorch 2.6, Transformers 4.53, PEFT 0.16, bitsandbytes 8‑bit load, TRL SFT wrapper; fp16 + grad‑accum. |
Resource hacks | Redirected HF cache to scratch; checkpoint every 10 min; disabled flash‑attention for compatibility. |
Evaluation | Cross‑entropy loss & token accuracy on held‑out split; wall‑clock comparisons across GPUs. |
Results
GPU | Dataset | Wall‑time | Final loss | Token accuracy | Notes |
---|
H200 141 GB | 5 K Mini | 18 min | 0.0229 | 98.98 % | Fastest; ample VRAM |
A100 40 GB | 5 K Mini | 25 min | 0.0231 | 98.7 % | Best speed / availability mix |
V100 32 GB | 5 K Mini | 40 min | 0.0242 | 98.1 % | Meets quota only on small runs |
Links