We use cookies to improve your browsing experience and to analyze our website traffic. By clicking “Accept All” you agree to our use of cookies. Privacy policy.
9 readsNo License

Text Summarization integrated with Web (Django)

Project Documentation: Text Summarization System

  1. Introduction
    1.1 Purpose
    This document provides formal documentation for the Text Summarization System developed using the BART model on a Django web platform. It outlines the project's objectives, system architecture, setup instructions, usage guidelines, and contact information.

1.2 Scope
The project aims to implement a text summarization tool that allows users to input lengthy articles and receive concise summaries. The system is designed for educational and research purposes and is applicable in various domains where information distillation is required.

  1. System Overview
    2.1 System Architecture
    The Text Summarization System consists of two main components:

Model Component: The BART model (facebook/bart-base) fine-tuned on a subset of the CNN/Daily Mail dataset.
Web Application Component: A user interface developed using Django, HTML, and CSS that facilitates user interaction with the summarization model.
2.2 Key Features
Text Summarization: Generates concise summaries from user-provided text.
Web Integration: Provides a user-friendly interface for input and output.
Efficient Processing: Utilizes a reduced dataset for model training to optimize performance.

  1. Dataset
    3.1 Dataset Description
    Source: CNN/Daily Mail dataset
    Link: CNN/Daily Mail Dataset
    Size: The model is trained on 30% of the original dataset for efficiency.

  2. Installation
    4.1 System Requirements
    Python 3.7 or later
    Pip for package management
    4.2 Required Packages
    Install the necessary packages using the following command:

pip install transformers torch==2.2.2 pandas matplotlib numpy seaborn tensorflow

4.3 Project Structure
The project directory contains the following files and folders:

datamodel_train_test.py: Code for data processing, model training, and testing.
text_summarization/: Folder containing Django application files.
models/: Directory with the saved trained BART model.
4.4 Model and Tokenizer
The BART model is too large for GitHub. A link to the saved model on Google Drive is provided:

Saved Model Link: Google Drive
4.5 Post-Training Setup
After training, place the saved model in the following directory:

text_summarization/text_summarization/models/saved_model

  1. Training Procedure
    5.1 Model Details
    Model: BART (facebook/bart-base)
    Epochs: 5
    Optimizer: AdamW with a learning rate of 5e-5
    Loss Monitoring: Tracks training and validation losses for evaluation.
    5.2 Training Instructions
    Run the following script to train the model:

python datamodel_train_test.py

  1. Usage
    6.1 Testing the Model
    To generate summaries for new articles, use the interactive function in datamodel_train_test.py.

6.2 Running the Django Application
Deploy the web application to access the summarization features. Follow these steps:

Navigate to the Django application directory.
Run the server with the command:

python manage.py runserver
Access the application in your web browser at http://127.0.0.1:8000.

  1. Conclusion
    The Text Summarization System provides an efficient way to condense information using advanced NLP techniques. This documentation serves as a comprehensive guide for users and developers to understand and utilize the system.

  2. Contact Information
    GitHub Repository: View on GitHub
    LinkedIn Profile: Muhammad Hasnain Kayani
    Email: muhammadhasnainkayani@gmail.com

  3. References
    BART Model Documentation: Hugging Face Transformers
    CNN/Daily Mail Dataset: Kaggle Dataset

Text Summarization integrated with Web (Django)