🚀 NER-RoBERTa: Fine-Tuned Named Entity Recognition Model

A robust Named Entity Recognition (NER) model fine-tuned on custom annotated resume/career-related data using RoBERTa architecture. This model is capable of extracting structured information such as personal details, education, work experience, skills, and more from unstructured text, making it highly suitable for resume parsing, HR automation, and document understanding tasks.

🧠 Model Details

Model architecture: RoBERTa base (roberta-base)
Task: Token Classification (NER)
Fine-tuned on: Annotated resume dataset (custom labels)
Entity types:
- NAME
- CONTACT, EMAIL, LOCATION
- LINKEDIN, GITHUB
- ORG_NAME, JOB_TITLE, START_DATE, END_DATE
- DEGREE, FIELD_OF_STUDY, GRADUATION_YEAR, GPA
- SKILLS, PROJECT_TITLE, LANGUAGES, OTHER

📦 Files Included

config.json
pytorch_model.bin or model.safetensors
tokenizer_config.json, vocab.json, tokenizer.json
special_tokens_map.json
merges.txt

📊 Example Usage

from transformers import RobertaTokenizerFast, RobertaForTokenClassification
import torch

# Load model and tokenizer
model = RobertaForTokenClassification.from_pretrained("venkatasagar/NER-roBERTa-finetuned")
tokenizer = RobertaTokenizerFast.from_pretrained("venkatasagar/NER-roBERTa-finetuned")

# Sample text
text = "John Doe is a software engineer at Google. He graduated with a B.Tech in Computer Science from MIT in 2022."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)

# Decode results
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[label_id] for label_id in predictions[0]]

for token, label in zip(tokens, predicted_labels):
    print(f"{token}: {label}")

📈 Intended Use Cases

Resume parsing
HR and recruitment platforms
Talent analytics
Job-matching engines
NLP-based document processors

🏷️ Tags

NER, transformers, huggingface, token-classification, roberta, resume-parser, nlp, named-entity-recognition, custom-dataset, career-data, information-extraction

📁 Datasets & Training

This model was trained on a custom-labeled resume dataset containing various sections such as education, experience, projects, and skills. The dataset included .txt, .pdf, and .docx formats processed using SpaCy and PyMuPDF/Docx libraries.

If you'd like to access the dataset or contribute, please contact the maintainer.

📤 Model Hosted On

Model Hub: https://huggingface.co/venkatasagar/NER-roBERTa-finetuned

🤝 Contributing

Feel free to fork the repository and open issues or PRs to enhance the model or pipeline!

🧑‍💻 Maintainer

Name: Venkata Sagar
Contact: [venkatasagar.maddela2004@gmail.com]

RoBERTa-NER: Fine-Tuned Token Classification for Structured Document Understanding