Project Documentation: Text Summarization System
1.2 Scope
The project aims to implement a text summarization tool that allows users to input lengthy articles and receive concise summaries. The system is designed for educational and research purposes and is applicable in various domains where information distillation is required.
Model Component: The BART model (facebook/bart-base) fine-tuned on a subset of the CNN/Daily Mail dataset.
Web Application Component: A user interface developed using Django, HTML, and CSS that facilitates user interaction with the summarization model.
2.2 Key Features
Text Summarization: Generates concise summaries from user-provided text.
Web Integration: Provides a user-friendly interface for input and output.
Efficient Processing: Utilizes a reduced dataset for model training to optimize performance.
Dataset
3.1 Dataset Description
Source: CNN/Daily Mail dataset
Link: CNN/Daily Mail Dataset
Size: The model is trained on 30% of the original dataset for efficiency.
Installation
4.1 System Requirements
Python 3.7 or later
Pip for package management
4.2 Required Packages
Install the necessary packages using the following command:
pip install transformers torch==2.2.2 pandas matplotlib numpy seaborn tensorflow
4.3 Project Structure
The project directory contains the following files and folders:
datamodel_train_test.py: Code for data processing, model training, and testing.
text_summarization/: Folder containing Django application files.
models/: Directory with the saved trained BART model.
4.4 Model and Tokenizer
The BART model is too large for GitHub. A link to the saved model on Google Drive is provided:
Saved Model Link: Google Drive
4.5 Post-Training Setup
After training, place the saved model in the following directory:
text_summarization/text_summarization/models/saved_model
python datamodel_train_test.py
6.2 Running the Django Application
Deploy the web application to access the summarization features. Follow these steps:
Navigate to the Django application directory.
Run the server with the command:
python manage.py runserver
Access the application in your web browser at http://127.0.0.1:8000.
Conclusion
The Text Summarization System provides an efficient way to condense information using advanced NLP techniques. This documentation serves as a comprehensive guide for users and developers to understand and utilize the system.
Contact Information
GitHub Repository: View on GitHub
LinkedIn Profile: Muhammad Hasnain Kayani
Email: muhammadhasnainkayani@gmail.com
References
BART Model Documentation: Hugging Face Transformers
CNN/Daily Mail Dataset: Kaggle Dataset
There are no models linked