In today’s fast-paced hiring landscape, HR teams face the challenge of sorting through thousands of resumes, which can be time-consuming and inefficient. This project aims to automate the resume classification process by categorizing resumes into four distinct categories, thereby reducing manual effort and enhancing accuracy.
Below is the count of different resume types in the dataset.
The classification model employs the TF-IDF (Term Frequency-Inverse Document Frequency) technique to transform the text data into a numerical format that can be used by machine learning algorithms.
Term Frequency (TF) measures how frequently a term appears in a document, normalized by the total number of terms to avoid bias toward longer documents.
Inverse Document Frequency (IDF) measures the importance of a term across the corpus. It is low for common terms and high for rare ones.
TF-IDF Score combines these measures to reflect the importance of a term in a document relative to the corpus.
The results section provides a comprehensive overview of the model performance, including train and test accuracy metrics.
This project demonstrates the effectiveness of using machine learning for automating resume classification, significantly reducing manual efforts while maintaining high accuracy.