
This project demonstrates parameter-efficient fine-tuning of Qwen 2.5 0.5B Instruct for Nigerian news headline generation using QLoRA (Quantized Low-Rank Adaptation). Working within the constraints of a single T4 GPU (16GB VRAM) (Google Colab free-tier), I achieved significant improvements across all evaluation metrics: ROUGE-1 increased by 17.13% (27.16% → 31.81%), ROUGE-2 by 40.78% (8.23% → 11.59%), and ROUGE-L by 27.88% (22.26% → 28.46%). The fine-tuned model demonstrates improved headline quality, better keyword selection, and enhanced contextual understanding of Nigerian news content. With only 1.08M trainable parameters (0.22% of the total model), this work showcases how resource-constrained practitioners can adapt modern language models for domain-specific tasks efficiently.
Nigerian news content presents unique challenges for automated headline generation. The content spans diverse topics—from local politics and economic policies to cultural events and regional security issues—each requiring contextual awareness that generic models often lack. Standard headline generation models, trained primarily on Western news sources, frequently miss cultural nuances, misinterpret local terminology, and fail to capture the appropriate tone for Nigerian audiences.
Consider this excerpt from our dataset:
"Amidst the worsening insecurity in the country, governors elected on the platform of the Peoples Democratic Party (PDP) on Wednesday..."
A generic model might produce: "Governors Rally to Defend Statehood Amidst Growing Security Concerns"
While grammatically correct, this headline misses the specific political context (PDP governors), the Nigerian security situation, and the characteristic directness of Nigerian news headlines.
The core challenge is adapting a small language model (0.5B parameters) to generate contextually appropriate headlines for Nigerian news while operating under strict resource constraints:
I employ QLoRA fine-tuning on 4,286 Nigerian news articles from AriseTv. QLoRA enables efficient fine-tuning through:
Base Model Selection
I selected Qwen 2.5 0.5B Instruct for several reasons:
Memory Footprint Analysis
Our QLoRA configuration with rank-8 adapters consumed approximately 3.73 GB of VRAM (9.3% utilization on a 40GB GPU). Actual training on T4 GPU used ~12GB including batch processing and gradient computation overhead.
QLoRA Configuration
Quantization: - Type: NF4 (4-bit NormalFloat) - Double quantization: Enabled - Compute dtype: bfloat16 LoRA Parameters: - Rank (r): 8 - Alpha: 16 - Dropout: 0.05 - Target modules: [q_proj, v_proj] - Trainable parameters: 1,081,344 (0.22%)
The rank-8 configuration strikes a balance between model capacity and training efficiency. Lower ranks (r=4) showed insufficient capacity for the task, while higher ranks (r=16) increased training time without proportional gains.
Source and Composition
Dataset: okite97/news-data (HuggingFace)
Data Format
Each sample consists of:
Example:
Excerpt: "Russia has detected its first case of transmission of
bird flu virus from animals to humans, according to health authorities."
Title: "Russia Registers First Case of Bird Flu in Humans"
Preprocessing
Data was formatted into instruction-following template:
Generate a concise and engaging headline for the following Nigerian news excerpt.
## News Excerpt:
{excerpt}
## Headline:
{title}
This chat-style formatting leverages Qwen's instruction-tuning while maintaining clear task specification.
Hyperparameters
| Parameter | Value | Rationale |
|---|---|---|
| Sequence length | 512 | Balance context and memory |
| Batch size | 16 | Maximum stable batch for T4 |
| Gradient accumulation | 2 | Effective batch size: 32 |
| Learning rate | 2e-4 | Standard for LoRA fine-tuning |
| LR scheduler | Cosine | Smooth convergence |
| Warmup steps | 50 | Stabilize early training |
| Max steps | 300 | ~1.1 epochs |
| Optimizer | paged_adamw_8bit | Memory-efficient optimization |
| Precision | bfloat16 | Training precision |
Training Environment
ROUGE Scores
I evaluate using ROUGE (Recall-Oriented Understudy for Gisting Evaluation):
ROUGE scores are particularly appropriate for headline generation as they measure:
Evaluation Protocol

Loss Curves
Training artifacts were tracked using Weights & Biases. The run history shows:
Training proceeded stably:
The final validation loss of 2.553 represents a 10.9% reduction from the initial loss of 2.868. The consistent decrease in both training and validation loss without divergence indicates healthy learning without overfitting.
Training Metrics Summary
| Metric | Initial | Final | Change |
|---|---|---|---|
| Training Loss | 2.917 | 2.418 | -17.1% |
| Validation Loss | 2.868 | 2.553 | -10.9% |
| Learning Rate | 2e-4 | 0.0 | Cosine decay |
| Grad Norm | Variable | 3.509 | Stable |
Zero-shot Performance
The base Qwen 2.5 0.5B Instruct model (without fine-tuning) achieved:
| Metric | Score |
|---|---|
| ROUGE-1 | 27.16% |
| ROUGE-2 | 8.23% |
| ROUGE-L | 22.26% |
Qualitative Analysis
Baseline headlines showed several patterns:
Example:
Excerpt: "Lewis Hamilton was gracious in defeat after Red Bull rival
Max Verstappen ended the Briton's quest for an unprecedented eighth..."
Baseline: "Lewis Hamilton's Gracious Victory After Red Bull's Max
Verstappen Seeks Record-Setting Eighth Win"
Issue: Contradictory (mentions "victory" for defeated driver),
overly long, awkward phrasing
Post-Training Performance
After QLoRA fine-tuning, the model achieved:
| Metric | Score | Improvement |
|---|---|---|
| ROUGE-1 | 31.81% | +17.13% |
| ROUGE-2 | 11.59% | +40.78% |
| ROUGE-L | 28.46% | +27.88% |
Comprehensive Results Summary
| Metric | Baseline | Fine-tuned | Improvement |
|---|---|---|---|
| ROUGE-1 | 27.16% | 31.81% | +17.13% |
| ROUGE-2 | 8.23% | 11.59% | +40.78% |
| ROUGE-L | 22.26% | 28.46% | +27.88% |
Statistical Significance
The improvements are substantial across all metrics:
ROUGE-1 (+17.13%)
ROUGE-2 (+40.78%)
ROUGE-L (+27.88%)
Visual Comparison
The bar chart visualization clearly demonstrates consistent improvements across all three ROUGE metrics, with the fine-tuned model (shown in green) substantially outperforming the baseline (shown in blue) in every category.
Example 1: Sports News
Excerpt: "Lewis Hamilton was gracious in defeat after Red Bull rival
Max Verstappen ended the Briton's quest for an unprecedented eighth..."
Reference: "F1: Hamilton Gracious in Title Defeat as Mercedes Lodge Protests"
Baseline: "Lewis Hamilton's Gracious Victory After Red Bull's Max
Verstappen Seeks Record-Setting Eighth Win"
Fine-tuned: "Hamilton Gracious After Red Bull Victory"
Analysis: The fine-tuned model:
Example 2: Business News
Excerpt: "Following improved corporate earnings by companies, low yield
in fixed income market, among other factors, the stock market segment of..."
Reference: "Nigeria's Stock Market Sustains Bullish Trend, Gains N5.64trn
in First Half 2022"
Baseline: "Boosting Corporate Profits: The Impact on Stock Market
Performance Amidst Yield Challenges"
Fine-tuned: "Nigeria's Stock Market Suffers as Corporate Earnings Slow"
Analysis: The fine-tuned model:
Example 3: Political News
Excerpt: "Amidst the worsening insecurity in the country, governors elected
on the platform of the Peoples Democratic Party (PDP) on Wednesday..."
Reference: "Nigeria: PDP Governors Restate Case for Decentralised Police"
Baseline: "Governors Rally to Defend Statehood Amidst Growing Security Concerns"
Fine-tuned: "Nigeria: PDP Governors Elected Amidst Worsening Security Crisis"
Analysis: The fine-tuned model:
Example 4: Health News
Excerpt: "Russia has detected its first case of transmission of bird flu
virus from animals to humans, according to health authorities."
Reference: "Russia Registers First Case of Bird Flu in Humans"
Baseline: "Russian Health Authorities Report First Bird Flu Transmission
from Animals to Humans"
Fine-tuned: "Russia Detects First Bird Flu Transmission from Animals to Humans"
Analysis: The fine-tuned model:
1. Conciseness
Fine-tuned headlines average 7-10 words vs 12-15 for baseline, matching Nigerian news style.
2. Contextual Awareness
Better recognition of:
3. Structural Improvements
4. Reduced Hallucination
Fewer factually incorrect statements (e.g., "victory" vs "defeat")
Parameter Efficiency
Training only 0.22% of model parameters (1.08M of 494M) proved sufficient because:
Memory Efficiency
4-bit quantization reduced memory requirements from ~48GB (full precision) to ~12GB (QLoRA), enabling:
1. Dataset Scope
2. Evaluation Constraints
3. Model Limitations
4. Generalization
While direct comparisons are difficult due to different datasets, our results align with trends in parameter-efficient fine-tuning:
1. Catastrophic Forgetting Analysis
Evaluate model retention of general capabilities on benchmarks like HellaSwag or ARC-Easy.
2. Expanded Evaluation
3. Dataset Expansion
1. Multilingual Support
Fine-tune on parallel corpora to support:
2. Multi-task Learning
Extend to related tasks:
3. Larger Models
Scale to 1B-3B parameter models for potential quality gains while maintaining efficiency through QLoRA.
4. Real-time Deployment
Optimize for production:
This project demonstrates that significant domain adaptation is achievable with minimal resources. By fine-tuning Qwen 2.5 0.5B Instruct with QLoRA on 4,286 Nigerian news samples, we achieved substantial improvements across all evaluation metrics—most notably a 40.78% gain in ROUGE-2, indicating better phrase-level matching with reference headlines.
Key Takeaways:
The success of this approach opens opportunities for domain-specific adaptations of small language models, particularly for underrepresented languages and regions. With proper dataset curation and efficient fine-tuning techniques, practitioners can build specialized models without requiring extensive computational resources.
Reproducibility: All code, configurations, and trained models are publicly available:
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv preprint arXiv
.14314.Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv preprint arXiv
.09685.Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., ... & Zhou, J. (2023). Qwen Technical Report. arXiv preprint arXiv
.16609.Lin, C. Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, 74-81.
Okite97. (2024). Nigerian News Dataset. HuggingFace Datasets. Retrieved from https://huggingface.co/datasets/okite97/news-data
# Model Configuration base_model: Qwen/Qwen2.5-0.5B-Instruct tokenizer_type: Qwen/Qwen2.5-0.5B-Instruct # Dataset Configuration dataset: name: okite97/news-data seed: 42 splits: train: all validation: 200 test: 200 # Task Configuration task_instruction: "Generate a concise and engaging headline for the following Nigerian news excerpt." sequence_len: 512 # Quantization Configuration bnb_4bit_quant_type: nf4 bnb_4bit_use_double_quant: true bnb_4bit_compute_dtype: bfloat16 # LoRA Configuration lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 target_modules: - q_proj - v_proj # Training Configuration num_epochs: 2 max_steps: 300 batch_size: 16 gradient_accumulation_steps: 2 learning_rate: 2e-4 lr_scheduler: cosine warmup_steps: 50 max_grad_norm: 1.0 save_steps: 100 logging_steps: 25 save_total_limit: 2 # Optimization optim: paged_adamw_8bit bf16: true # Weights & Biases wandb_project: llama3_nigerian_news wandb_run_name: nigerian-news-qlora
Sample 5:
Excerpt: "The support groups of Vice President Yemi Osinbajo and the
National Leader of the All Progressives Congress (APC), Senator Bola..."
Reference: "Nigeria: Jonathan's Rumoured Ambition Poses No Threat, Say
Osinbajo, Tinubu's Support Groups"
Baseline: "Vice President Yemi Osinbajo and APC Leader's Support Groups
Offer Hope Amidst Political Turmoil in Nigeria"
Fine-tuned: "Nigeria: Opposition Leaders Support Osimowo's Call to End
Violence in Lagos"
This work was completed as part of the LLMED Program Module 1 certification by Ready Tensor. Special thanks to the open-source community for tools and resources that made this project possible: HuggingFace (Transformers, PEFT, Datasets), Weights & Biases (experiment tracking), and the Qwen team for the base model.
Training Infrastructure: Google Colab Pro+ (T4 GPU access)
Document prepared: December 2024
Author: Blaqadonis
Contact: HuggingFace Profile
Project: LLMED Module 1 Certification