A robust Named Entity Recognition (NER) model fine-tuned on custom annotated resume/career-related data using RoBERTa architecture. This model is capable of extracting structured information such as personal details, education, work experience, skills, and more from unstructured text, making it highly suitable for resume parsing, HR automation, and document understanding tasks.
Model architecture: RoBERTa base (roberta-base)
Task: Token Classification (NER)
Fine-tuned on: Annotated resume dataset (custom labels)
Entity types:
NAMECONTACT, EMAIL, LOCATIONLINKEDIN, GITHUBORG_NAME, JOB_TITLE, START_DATE, END_DATEDEGREE, FIELD_OF_STUDY, GRADUATION_YEAR, GPASKILLS, PROJECT_TITLE, LANGUAGES, OTHERconfig.jsonpytorch_model.bin or model.safetensorstokenizer_config.json, vocab.json, tokenizer.jsonspecial_tokens_map.jsonmerges.txtfrom transformers import RobertaTokenizerFast, RobertaForTokenClassification import torch # Load model and tokenizer model = RobertaForTokenClassification.from_pretrained("venkatasagar/NER-roBERTa-finetuned") tokenizer = RobertaTokenizerFast.from_pretrained("venkatasagar/NER-roBERTa-finetuned") # Sample text text = "John Doe is a software engineer at Google. He graduated with a B.Tech in Computer Science from MIT in 2022." # Tokenize and predict inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) predictions = torch.argmax(outputs.logits, dim=-1) # Decode results tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) predicted_labels = [model.config.id2label[label_id] for label_id in predictions[0]] for token, label in zip(tokens, predicted_labels): print(f"{token}: {label}")
NER, transformers, huggingface, token-classification, roberta, resume-parser, nlp, named-entity-recognition, custom-dataset, career-data, information-extraction
This model was trained on a custom-labeled resume dataset containing various sections such as education, experience, projects, and skills. The dataset included .txt, .pdf, and .docx formats processed using SpaCy and PyMuPDF/Docx libraries.
If you'd like to access the dataset or contribute, please contact the maintainer.
Model Hub: https://huggingface.co/venkatasagar/NER-roBERTa-finetuned
Feel free to fork the repository and open issues or PRs to enhance the model or pipeline!
Name: Venkata Sagar
Contact: [venkatasagar.maddela2004@gmail.com]