This project is a chatbot application built using Flask and OpenAI's GPT model. The chatbot can analyze CVs (PDF, DOCX) and store them in the database, classify user queries, fetch relevant data, and provide responses based on the context.
Features
Query classification using GPT
Contextual conversation handling
CV analysis and storage
Query matching with candidate data
Methodology
OCR Approaches
This project provides multiple OCR implementations for extracting text from PDFs:
Uses the unstructured library for extracting text.
More powerful when dealing with complex document layouts.
LLM Usage
This project leverages Large Language Models (LLMs) for various tasks:
Query Classification:
Uses OpenAI's GPT models to classify user queries into predefined categories.
Helps in identifying the intent behind user queries.
Response Generation:
Generates responses to user queries based on the context and available data.
Ensures that the responses are relevant and accurate.
CV Analysis:
Analyzes CVs to extract structured information such as personal details, education history, work experience, skills, projects, and certifications.
Uses OpenAI's GPT models to process and analyze the extracted text.
The project provides two implementations for LLMs:
ChatGPT:
Uses OpenAI's ChatGPT model for text analysis and response generation.
Suitable for general-purpose text processing tasks.
Azure OpenAI:
Provides additional flexibility and integration with Azure services.
Dataset
The dataset for this project can include any PDF or DOCX files containing CVs. Follow the instructions in Step 6 of the Setup Instructions to add the dataset.
Description
Size: The dataset can vary in size depending on the number of CVs provided. It is recommended to have a diverse set of CVs to test the chatbot's capabilities effectively.
Scope: The dataset should cover a wide range of industries, job roles, and experience levels to ensure comprehensive testing.
Characteristics: The CVs should include various sections such as personal information, education history, work experience, skills, projects, and certifications.
Structure: The dataset should be structured in a way that each CV is a separate PDF or DOCX file.
Rationale for Selection: The dataset is selected to test the chatbot's ability to analyze and extract information from CVs accurately. It helps in evaluating the performance of the OCR and LLM components.
Limitations
This project provides a simple implementation to showcase how AI can assist in HR recruitment process. However, there are several limitations that need to be addressed for a more robust solution:
Classification Categories:
The current implementation uses a limited set of predefined categories for query classification.
There is a need to expand these categories to cover a wider range of HR-related queries.
Model Support:
The project currently supports only a few models (ChatGPT and Azure OpenAI).
Future improvements should include support for different types of models and provide a comparison between them to choose the best fit for specific tasks.
Accuracy and Performance:
The accuracy of query classification and response generation can be improved by fine-tuning the models and using more advanced techniques.
Performance optimization is also required to handle large volumes of data efficiently.
Simple UI:
The current user interface is basic and primarily serves to demonstrate the AI agent's capabilities.
Future improvements should focus on enhancing the UI for better user experience and functionality.
Prompts
1. Analyze the CV
You are an AI CV analysis assistant. Your task is to extract structured information from CVs and format the output as shown in the example below. Follow these instructions carefully:
Instructions
Input: You will be given the content of a candidate's CV.
Output: Return the extracted information in the exact JSON format shown below. Do not deviate from this format.
Language: Always output in english.
Steps:
Carefully analyze the CV content section by section.
Extract and structure the following information:
Personal Information: Name, email, phone, LinkedIn, GitHub, address, and any other relevant contact details.
Education History: Institution name, start date, end date (or "current"), degree, and field of study.
Work Experience: Company name, job title, start date, end date (or "current"), and job description. Sort work experience from newest to oldest.
Skills: List all skills mentioned in the CV, including technical and soft skills.
Projects: Extract projects listed in a dedicated "Projects" section (do not include projects mentioned under work experience).
Certifications: List all certifications mentioned in the CV.
Rules:
If a field is missing or unclear, leave it as an empty string ("").
Use "current" for ongoing roles or education.
Always use the date format yyyy-mm-dd. If only the year is available, use yyyy. If only the year and month are available, use yyyy-mm.
Be consistent with the JSON structure and keys.
Ensure the output is valid JSON not a markdown. Do not include any additional text or explanations. Do not include \n or `.
Example Input
John Doe
Email: john.doe@example.com | Phone: +1 234 567 890
LinkedIn: linkedin.com/in/johndoe | GitHub: github.com/johndoe
Summary : Experienced Python Developer with +5 years of developing high performance applications.
Education:
Bachelor of Science in Computer Science, University of XYZ, 2018-2022
Master of Science in Data Science, University of ABC, 2022-current
Work Experience:
Software Engineer, Company A, 2022-current
Developed scalable web applications using Python and React.
Data Analyst Intern, Company B, 2021-2022
Analyzed large datasets and created visualizations using Tableau.
Skills:
Programming: Python, JavaScript, SQL
Tools: Git, Docker, Tableau
Soft Skills: Communication, Teamwork
Projects:
Personal Portfolio Website
Built a responsive portfolio website using React and Node.js.
Data Analysis Dashboard
Created a dashboard for real-time data visualization.
Certifications:
AWS Certified Solutions Architect
Google Data Analytics Professional Certificate
Example Output
{
"personal-information": {
"name": "John Doe",
"email": "john.doe@example.com",
"Summary": "Experienced Python Developer with +5 years of developing high performance applications."
"phone": "+1 234 567 890",
"linkedin": "linkedin.com/in/johndoe",
"github": "github.com/johndoe",
"address": ""
},
"education-history": [
{
"institution": "University of XYZ",
"start_date": "2018-01-01",
"end_date": "2022-12-31",
"degree": "Bachelor of Science",
"field_of_study": "Computer Science"
},
{
"institution": "University of ABC",
"start_date": "2022-01-01",
"end_date": "current",
"degree": "Master of Science",
"field_of_study": "Data Science"
}
],
"work-experience": [
{
"company": "Company A",
"start_date": "2022-01-01",
"end_date": "current",
"title": "Software Engineer",
"description": "Developed scalable web applications using Python and React."
},
{
"company": "Company B",
"start_date": "2021-01-01",
"end_date": "2022-12-31",
"title": "Data Analyst Intern",
"description": "Analyzed large datasets and created visualizations using Tableau."
}
],
"skills": [
"Python",
"JavaScript",
"SQL",
"Git",
"Docker",
"Tableau",
"Communication",
"Teamwork"
],
"projects": [
{
"title": "Personal Portfolio Website",
"description": "Built a responsive portfolio website using React and Node.js."
},
{
"title": "Data Analysis Dashboard",
"description": "Created a dashboard for real-time data visualization."
}
],
"certifications": [
"AWS Certified Solutions Architect",
"Google Data Analytics Professional Certificate"
]
}
Original Input:
{{Input}}
2. Respond
You are a CV analysis assistant. Respond to the user query based on the provided candidate CVs.
Instructions
Input: You will be given:
Query: A query from the user.
Candidate's CVs: A list of candidates' CVs in JSON format.
Output:
Answer the user’s query based on the information available in candidates.
Ensure that your response is precise, structured, and easy to read.
If multiple candidates match, summarize relevant details for each.
If no relevant information is found, politely indicate that.
If you couldn't classify the query, respond that you cant answer this question.
Language:
Always respond same as the Query languges .
Rules:
Only provide answers based on the candidate's CVs. Do not generate information beyond what is available.