This project involves building a Named Entity Recognition (NER) model using BERT with and without Conditional Random Fields (CRF) and RoBERTa. The goal is to identify and classify entities within a text, such as names, dates, and other predefined categories. The project leverages the power of BERT for token classification and enhances it with CRF for sequence labeling.
The dataset used in this project is a labeled NER dataset in CSV format, containing words, part-of-speech (POS) tags, and NER tags. The dataset is split into training and testing sets for model training and evaluation.
The project is structured as follows:
notebooks: Contains Jupyter notebooks for exploratory data analysis and model training.requirements: Dependencies needed to run this project.To run this project, ensure you have Python 3.7 or higher installed. You can install the required packages using the following command:
pip install -r requirements.txt
The model's performance is evaluated using metrics like accuracy and F1 score. The results demonstrate the effectiveness of combining BERT with CRF for NER tasks. This project highlights the capabilities of using advanced models like BERT and CRF for named entity recognition. The model can be further improved by fine-tuning and experimenting with different hyperparameters.
Hugging Face for providing the transformer models and tokenizers.
https://github.com/muhammadahmedzaheer/Named-Entity-Recognition-using-BERT-and-RoBERTa/tree/main