Jul 29, 2025●12 reads

LlamaVox: Humanizing AI through advanced LLM fine-tuning

Naveen Rajagopal Mohanraj

LlamaVox – Fine‑Tuning Llama 3.1 (8 B) on Budget Hardware

Abstract

I demonstrate that Meta Llama 3.1 (8 B parameters) can be fine‑tuned on modest, shared‑cluster hardware by combining LoRA adapters, mixed‑precision training, and aggressive cache management.
Using a single H200 or A100 GPU with 16–40 GB of effective memory and a 4‑hour job window, I cut mini‑dataset training loss from 0.116 to 0.023 and reached 98.98 % token‑level accuracy in 18 minutes while storing adapters as small as 81 MB. The open‑source LlamaVox toolkit lowers the cost barrier for researchers and hobbyists who lack multi‑GPU servers.

Key Findings

LoRA shrinks trainable parameters ≈ 250 ×, yielding adapters < 0.2 GB and slashing VRAM needs.
8‑bit base‑model loading + fp16 training enables single‑GPU fine‑tuning without convergence loss.
H200 gives a ~1.4 × speed‑up over A100, but both finish a 50 K‑sample run inside an academic 4‑hour job.
Loss plateaus after 2–3 epochs on Mini and 1 epoch on Medium, signalling rapid convergence under LoRA.

Methodology

Step	Details
Model + Adapter	Base: Llama 3.1 8B; LoRA ranks 8 & 16 → 65 K–131 K trainable params (81–161 MB on disk).
Datasets	5 K "Mini" (2.2 MB), 50 K "Medium", plus 1 K synthetic dialogue samples.
Hardware	Northeastern U. Discovery cluster; single H200 (141 GB), A100 (40 GB) or V100 (32 GB); 4 h SLURM limit.
Training stack	PyTorch 2.6, Transformers 4.53, PEFT 0.16, bitsandbytes 8‑bit load, TRL SFT wrapper; fp16 + grad‑accum.
Resource hacks	Redirected HF cache to scratch; checkpoint every 10 min; disabled flash‑attention for compatibility.
Evaluation	Cross‑entropy loss & token accuracy on held‑out split; wall‑clock comparisons across GPUs.

Results

GPU	Dataset	Wall‑time	Final loss	Token accuracy	Notes
H200 141 GB	5 K Mini	18 min	0.0229	98.98 %	Fastest; ample VRAM
A100 40 GB	5 K Mini	25 min	0.0231	98.7 %	Best speed / availability mix
V100 32 GB	5 K Mini	40 min	0.0242	98.1 %	Meets quota only on small runs

LlamaVox: Humanizing AI through advanced LLM fine-tuning

Table of contents

LlamaVox – Fine‑Tuning Llama 3.1 (8 B) on Budget Hardware

Abstract

Key Findings

Methodology

Results

Links

Table of contents

Code

Code

Datasets

Datasets

Table of contents

LlamaVox – Fine‑Tuning Llama 3.1 (8 B) on Budget Hardware

Abstract

Key Findings

Methodology

Results

Links

Table of contents

Code

Code

Datasets

Datasets

LlamaVox – Fine‑Tuning Llama 3.1 (8 B) on Budget Hardware