Abstract
This project explores the automation of job application document generation using LangChain and Llama-3.3-70B-Versatile, a powerful large language model (LLM). The workflow involves extracting job descriptions from career pages, structuring them into a standardized JSON format, processing resumes from PDFs, and generating personalized cover letters dynamically. This paper discusses the methodology, implementation, experiments conducted, and results obtained, highlighting the efficiency of AI in professional document generation.
Introduction
Job applications often require tailored cover letters, which can be time-consuming to draft. Automating this process using AI can improve efficiency and ensure better personalization. This project utilizes LangChain’s prompt engineering capabilities, combined with Llama-3.3-70B-Versatile, to extract key information from job descriptions and resumes, structuring them for automated document generation.
Related work
Previous research in AI-assisted job applications has explored the use of LLMs for resume screening and cover letter drafting. Approaches such as BERT-based text summarization, GPT-3 cover letter generation, and template-based automation have been implemented in recruitment systems.
Methodology
The project workflow involves:
Data Extraction: Parsing job descriptions from web pages.
Data Structuring: Converting extracted information into JSON format.
Resume Processing: Extracting text from PDF resumes using PyMuPDF (Fitz).
Cover Letter Generation: Using LangChain prompts with Llama-3.3-70B-Versatile to generate a personalized cover letter.
Before running the script, the pymupdf library should be installed to handle PDF data:
Additionally, the API key for langchain_groq should be set up:
os.environ["GROQ_API_KEY"] = "your_actual_api_key"
Script Workflow
Importing Necessary Libraries
The script imports essential libraries for text extraction, natural language processing, and AI-driven text generation:
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
import fitz # PyMuPDF for PDF parsing
Setting Up the AI Model
The script initializes the ChatGroq model with predefined parameters:
llm = ChatGroq(
temperature=0,
groq_api_key='YOUR_API_KEY',
model_name="llama-3.3-70b-versatile"
)
This model is responsible for generating responses based on prompts. It was chosen because it offers better comprehension of complex prompts. It is also not without its shortcomings.
Job Description Parsing
A job description scraped from a website is processed to extract relevant details in JSON format by feeding it to ChatGPT.
"""
{page_data}
The text is from the career's page of a website.
Extract the job title and return it in JSON format with the following keys:
role
, experience
, skills
, key responsibilities
, and description
.
Only return valid JSON.
"""
The extracted JSON is parsed using JsonOutputParser()
Extracting Text from Resume (PDF)
A function extract_text_from_pdf() is created to read the resume and extracts text content and then called.
def extract_text_from_pdf(file_path):
with fitz.open(file_path) as doc:
text = "\n".join([page.get_text("text") for page in doc])
return text
pdf_path = "/content/WALTER KIPTANUI ROTICH CV.pdf"
pdf_text = extract_text_from_pdf(pdf_path)
Generating the Cover Letter
A structured prompt is used to instruct the AI model on how to generate a cover letter:
prompt_cover_letter = PromptTemplate.from_template(
"""
{resume_text}
{job_data}
Write a compelling, personalized cover letter tailored for the job.
The letter should:
"""
)
The AI model is used to generate the cover letter:
Saving and Displaying the Cover Letter
Finally, the generated cover letter is printed and saved to a text file:
Expected Outcome
When the script runs successfully:
The job description is extracted and structured into JSON.
The resume's text is extracted from the PDF.
A well-structured and customized cover letter is generated and saved.
Use Cases
• Automating job application processes.
• Ensuring consistency and professionalism in cover letters.
• Reducing time spent on manually crafting application letters (main reason for generating the script).
Ongoing Enhancements
• Add more AI models for comparison (e.g., OpenAI's GPT models).
• Implement web scraping in the script to dynamically extract job descriptions (in progress).
• Enhance the formatting of the generated cover letter for improved readability and accuracy.
The full link to the code is found here: github.com/walter-kiptanui/LLM_cover_letter_generator
pip install langchain_groq
from langchain_groq import ChatGroq
llm = ChatGroq( temperature=0, groq_api_key='YOUR_API_KEY', model_name="llama-3.3-70b-versatile" ) response = llm.invoke("The first person to land on moon was ...") print(response.content)
text = """Data Scientist/Engineer at Greenspoon Kenya View Jobs in Food Services / View Jobs at Greenspoon KenyaPosted: Jan 3, 2025Deadline: Not specified Greenspoon Kenya is the online store for people who care about what they eat. Founded in September 2016, Greenspoon was born of a desire to support artisan producers in Kenya, and at the same time provide increased transparency to consumers around the provenance of their food. We are proud to be the first online retail store specifically geared towards showc... Read more about this company Data Scientist/EngineerJob TypeFull TimeQualificationBA/BSc/HNDExperience 2 yearsLocationNairobiJob FieldData, Business Analysis and AI , ICT / Computer As our Data Scientist/Engineer, you'll architect and implement data solutions that power Greenspoon's next phase of growth. You'll work closely with operational teams (e.g. logistics, finance, marketing) to transform our vast data resources into strategic advantages, helping us make smarter decisions about inventory, customer preferences, delivery optimization, and market trends. Key responsibilities Technical leadership: Design and implement scalable data warehouse and data lake architectures Build and maintain ETL pipelines for processing diverse data sources Develop and deploy machine learning models for demand forecasting, logistics optimizations, and customer behavior analysis Create robust data validation and quality assurance processes Implement data governance frameworks and best practices Business impact: Partner with business teams to identify opportunities for data-driven optimization Develop predictive models for inventory management and supply chain efficiency Create dashboards and visualization tools for business intelligence Analyze customer behavior patterns to enhance personalization and service delivery Support data-driven decision making across all departments Development: Build automated reporting systems and self-service analytics tools Develop APIs and interfaces for internal data access Optimize data storage and processing for cost efficiency Required skills & experience Technical expertise: Strong programming skills in Python, SQL, and data processing frameworks Experience with cloud-based data platforms (AWS, Google Cloud, or Azure) Conversant with ETL technologies (e.g. Airbyte, Postman) Proficiency in data warehousing concepts and technologies (e.g. Snowflake, Azure) Expertise in machine learning frameworks and statistical analysis Knowledge of efficiently managing data visualization tools (e.g. PowerBI, Tableau) Experience in automation software (e.g. PowerAutomate, UIPath) is a plus Experience working with D2C ecommerce platforms (Shopify, WooCommerce) is a plus Experience working with ERPs (e.g. Microsoft Dynamics, SAP) is a plus Knowledge of WordPress data structures is a plus Business acumen: Demonstrated ability to translate business requirements into technical solutions Strong project management and stakeholder communication skills Ability to explain complex technical concepts to non-technical audiences Track record of delivering data projects with measurable business impact Background: Degree in computer science, statistics or similar with excellent academic achievement 2+ years of experience working with data and algorithms in some sort Analytical mindset combined with business pragmatism Self-directed with strong problem-solving abilities Excellent communication and collaboration skills Adaptable to changing priorities in a fast-paced environment Passion for sustainability and social impact Method of Application Please email us at hr@greenspoon.co.ke with a cover letter clearly indicating your achievements, experience and expertise. Give examples of business problems you have solved, what the impact was and your approach to solving the problem and which challenges you faced. An analytical test will be part of the hiring procedure."""
from langchain_core.prompts import PromptTemplate prompt_extract = PromptTemplate.from_template( """ ### SCRAPED TEXT FROM WEBSITE: {page_data} ### INSTRUCTION: The text is from the career's page of a website. Your job is to extract the job title and return it in JSON format containing the following keys: `role`, `experience`, `skills`, 'key responsibilities', and `description`. Only return the valid JSON. ### VALID JSON (NO PREAMBLE): """ ) chain_extract = prompt_extract | llm res = chain_extract.invoke(input={'page_data':text}) type(res.content)
from langchain_core.output_parsers import JsonOutputParser json_parser = JsonOutputParser() json_res = json_parser.parse(res.content) json_res
type(json_res)
pip install fitz
pip uninstall pymupdf -y
pip install pymupdf --no-cache-dir
import fitz # PyMuPDF for PDF parsing
# Function to extract text from PDF file_path = '' def extract_text_from_pdf(/content/WALTER KIPTANUI ROTICH CV.pdf): with fitz.open(/content/WALTER KIPTANUI ROTICH CV.pdf) as doc: text = "\n".join([page.get_text("text") for page in doc]) return text
file_path = '/content/WALTER KIPTANUI ROTICH CV.pdf'
# Function to extract text from PDF def extract_text_from_pdf(file_path): with fitz.open(file_path) as doc: text = "\n".join([page.get_text("text") for page in doc]) return text
# Example usage: pdf_path = "/content/WALTER KIPTANUI ROTICH CV.pdf" pdf_text = extract_text_from_pdf(pdf_path) print(pdf_text)
# Cover Letter Prompt prompt_cover_letter = PromptTemplate.from_template( """ ### RESUME: {resume_text} ### JOB DESCRIPTION: {job_data} ### TASK: Write a compelling, personalized cover letter tailored for the job. The letter should: - Start with a strong introduction that grabs attention. - Highlight my most relevant skills and experience. - Clearly explain why I am a perfect fit for the role. - End with a professional call-to-action. ### COVER LETTER: """ )
# Create LangChain pipeline chain_cover_letter = prompt_cover_letter | llm # Generate Cover Letter res = chain_cover_letter.invoke({ "resume_text": pdf_path, "job_data": json_res }) # Save and display the cover letter cover_letter_text = res.content print(cover_letter_text) # Optional: Save to a file with open("generated_cover_letter.txt", "w", encoding="utf-8") as file: file.write(cover_letter_text)
# Your python code here
Italic text
Results
These are the results generated by the script"
Dear Hiring Manager,
As a seasoned data professional with a passion for driving business growth through data-driven insights, I was thrilled to discover the Data Scientist/Engineer role at Greenspoon. With a strong foundation in data processing frameworks, cloud-based data platforms, and machine learning frameworks, I am confident that my skills and experience make me an ideal fit for this position. My resume, attached as Walter Kiptanui Rotich CV.pdf, provides a detailed account of my technical expertise and accomplishments.
With over 2 years of experience in working with various data technologies, including Python, SQL, and data warehousing concepts, I possess a unique blend of technical and business acumen that enables me to architect and implement data solutions that drive business impact. My expertise in ETL technologies, such as Airbyte and Postman, has allowed me to streamline data pipelines and optimize data workflows. Additionally, my experience with data visualization tools like PowerBI and Tableau has enabled me to effectively communicate complex data insights to both technical and non-technical stakeholders.
I am particularly drawn to this role at Greenspoon because of the opportunity to work closely with operational teams to transform vast data resources into strategic advantages. My experience in working with cross-functional teams has taught me the importance of collaboration and effective communication in driving business outcomes. I am excited about the prospect of applying my skills and expertise to help Greenspoon make smarter decisions about inventory, customer preferences, delivery optimization, and market trends.
As a technical leader, I have a proven track record of developing and implementing data solutions that drive business growth and improvement. My experience with automation software, such as PowerAutomate and UIPath, has allowed me to automate manual processes and improve operational efficiency. I am confident that my technical expertise, combined with my business acumen and passion for data-driven decision making, make me a perfect fit for this role.
Thank you for considering my application. I would welcome the opportunity to discuss my qualifications further and explore how my skills and experience align with the needs of Greenspoon. Please do not hesitate to contact me at your convenience to arrange a meeting or discussion.
Sincerely,
Walter Kiptanui Rotich"
The generated cover letter was evaluated based on relevance and coherence. The AI-generated letters generally maintained a degree of personalization. By providing it with rich resume details such as portfolio links and also separating resume details into different files, the LLM will be able to accurately align the job-related information from descriptions with candidate experience. The JSON structuring of job details improved automation efficiency, and the LLM effectively translated structured data into well-articulated documents.
Discussion
The results demonstrate that AI-assisted document generation significantly reduces the time required for job applications while improving personalization. However, challenges such as handling unstructured job descriptions and refining output formatting remain. With further refinement and better model choice, better results will be obtained and this will go a long way in saving job applicants the hassle of cover letter drafting.
Conclusion
This project shows the potential of AI in automating job application processes through structured data extraction and document generation. This can really help job applicants to scour several job descriptions and use their CV to generate personalized cover letters suitable for the job description. This will save them the agony of manual drafting of cover letters.
Brown, T., et al. "Language Models are Few-Shot Learners." NeurIPS, 2020.
Vaswani, A., et al. "Attention is All You Need." NeurIPS, 2017.
OpenAI. "GPT-4 Technical Report." 2023.
LangChain Documentation. Available at: https://www.langchain.com
PyMuPDF (Fitz) Documentation. Available at: https://pymupdf.readthedocs.io/en/latest/
Acknowledgements
We extend our gratitude to the LangChain and OpenAI communities for their contributions to AI advancements. Special thanks to my mentor Ken Mbuki and peer Berclay Manani and Edwin Mutwo for their insights in refining this project.
Appendix
Script Execution Steps:
Install dependencies (pip install langchain_groq fitz)
Load job descriptions from web data
Parse resume PDFs using PyMuPDF
Generate structured job descriptions using PromptTemplate
Use LangChain AI pipeline to generate cover letters
Save and present results
Code Repository: https://github.com/walter-kiptanui/LLM_cover_letter_generator
Example Output: Sample extracted JSON and generated cover letter
There are no models linked