Jun 18, 2025●12 reads●MIT License

VeriNews - fake news detector in regional language (Hindi)

BERT
NLP
Streamlit

Shashank Kamble

📰 VeriNews - Hindi Fake News Detection

Detect whether a Hindi news headline is Fake or Real.

Abstract

VeriNews is a lightweight yet high-accuracy system for detecting Fake news in Hindi headlines. By fine-tuning DistilBERT on a carefully curated corpus of labeled Hindi news, the model achieves reliable real-time predictions while keeping memory and compute costs low enough for commodity hardware. A Streamlit front-end makes the detector accessible to non-technical users, allowing journalists, fact-checkers, and the public to test headlines instantly and help slow the spread of misinformation.

Introduction

The rapid growth of online news, social media, and messaging apps in India has created fertile ground for misinformation. English-centric fact-checking tools struggle with Hindi, which remains the primary language for millions of readers. VeriNews addresses this gap by:

Focusing exclusively on Hindi — capturing language-specific cues often missed by multilingual models.
Prioritizing speed and deployability — so the tool can run on laptops or cloud free tiers.
Providing an intuitive UI — making ML-powered verification accessible to reporters and the wider public.

Use Case Diagram

Screenshot 2025-06-18 at 11.25.39 AM.png

Figure 1. Use Case Diagram — VeriNews User Interaction
This diagram illustrates how different users interact with the VeriNews system. Users can input Hindi news headlines through either the Command Line Interface (CLI) or the Streamlit-based Web Interface. The system processes the input and returns a prediction — real or fake.

Project Objectives

High-precision classification
Achieve ≥ 90 % F1-score on held-out Hindi headline data while maintaining low false-positive rates.

Low-latency inference
Keep end-to-end prediction time under 300 ms on a single CPU core to enable real-time use.
Resource efficiency
Limit GPU memory footprint to ≤ 2 GB during training; CPU-only inference after deployment.
User-friendly workflow
Provide both CLI and Streamlit interfaces plus clean, reproducible code (requirements, scripts, notebooks).

Workflow of the Project

Screenshot 2025-06-18 at 11.25.33 AM.png
Figure 2. Workflow Diagram — VeriNews System Pipeline
This diagram outlines the internal processing flow of the VeriNews application. It starts with user input, applies Hindi-specific text preprocessing, passes the data to the DistilBERT-based classifier, and displays the prediction via the chosen interface.

Dataset & Context

Property	Details
Source	Aggregated from multiple Hindi news portals and fact-checking sites (see /dataset/README.md).
Size	40 k headlines (21 k real, 19 k fake)
Time span	2016 – 2023
Labelling process	Cross-verified by human annotators; fake samples validated against certified fact-check portals (AltNews, BOOM, PIB Fact-Check).
Pre-processing	De-duplication, Unicode normalization (NFKC), removal of non-Devanagari punctuation, custom Hindi stop-word list, stemming, and WordPiece tokenization (from transformers library).
Train / Val / Test split	70 % / 15 % / 15 % (stratified by label)

Model Architecture — DistilBERT

VeriNews uses DistilBERT distilbert/distilbert-base-multilingual-cased for classifying Hindi news headlines as real or fake. DistilBERT is a lighter, faster version of BERT that retains most of its accuracy while reducing model size and inference time—ideal for real-time applications.

Key Details:

Base checkpoint: distilbert-base-multilingual-cased, supporting Devanagari script.
Classifier head: A small feed-forward layer on top of DistilBERT for binary classification.
Training: Fine-tuned using cross-entropy loss with AdamW optimizer and early stopping.
Preprocessing: Includes Hindi-specific steps—stop-word removal, stemming, and tokenization.
Performance: Achieves fast, accurate results even on CPU, with <2 GB memory usage during training.

DistilBERT strikes a balance between speed and accuracy, making it well-suited for a public-facing Hindi fake news detector.

Language:

Python

Features

Multilingual Support: Built to handle Hindi language articles.
Efficient Performance: DistilBERT model provides fast and accurate results with reduced computational load.
Custom Hindi Pre-processing: Includes steps like Hindi stop-word removal, stemming, and tokenization.

🛠️How to Use:

Step 1: Download the Dataset

Go to the dataset
Download the dataset and store it in the dataset directory of the project.

Step 2: Install Required Dependencies

Install the necessary dependencies:
```
pip install -r requirements.txt
```

Make sure you have Python installed and a virtual environment activated to avoid dependency issues.

Step 3: Run the Model

To start the fake news detection model, run the script.py file and train the model (if not already).
To run the model on terminal:
```
python script.py
```
To run it on a browser using streamlit (UI):
```
streamlit run main.py
```

Implementation/Output:

Run the file:

Figure 3. UI built on Streamlit showing the Homepage.

Enter the news article from testing split:

Figure 4. Snapshot of the dataset.

Fake or Legit?:

Figure 5. User Interface showing output as Fake News
Figure 5 shows the User Interface with the output classified as Fake News. Once the user submits a news article, the system analyzes and determines it as potentially misleading or false. This output allows users to identify and avoid spreading unverified or deceptive content.

Figure 6. User Interface showing output as Real News
Figure 6. demonstrates the User Interface of the Hindi Fake News Detection System displaying the output as Real News. After the user inputs a news article, the system processes and classifies it, indicating that the content is likely credible. The interface provides a clear result, helping users verify the authenticity of the article.

Conclusion

VeriNews addresses the critical challenge of detecting fake news in Hindi—an underserved area in the misinformation landscape. By combining the efficiency of DistilBERT with Hindi-specific preprocessing and an accessible interface, the system offers both technical rigor and practical usability. Whether used by journalists, researchers, or everyday readers, VeriNews empowers users to verify the credibility of Hindi news headlines quickly and reliably.

Future enhancements could include multilingual expansion, and integration with real-time news sources for live detection.