This project showcases the implementation of two different use cases which combine computer vision (CV) and natural language processing (NLP) tasks to create a responsive AI agent. The agent utilises Langchain Agents along with OpenAI's GPT-3.5 Turbo model to suggest items of clothing based on text and image inputs.
Methodology
There are 2 primary use cases:
Use Case 1: Find Garments Based on Text + Image
Scenario: A user provides a chat message and an example image of someone wearing denim trousers. The agent suggests similar denim trousers.
Agentic Process:
Parse the user message using GPT-3.5.
Use object detection to identify clothing items in the provided image.
Crop and save the image of the specified item (trousers).
Perform a vector similarity search against the FAISS database.
Return the product IDs of the most similar items.
Custom LangChain Agent Tools Used:
Object detection tool - fashion object pretrained weights from HuggingFace
Image cropping and saving tool
Vector similarity search and image plot tool
Use Case 2: Suggest Complementary Garments Based on Text + Image
Scenario: A user provides a chat message and an image of a red jacket, asking for items that would go well with it in autumn colours.
Agentic Process:
Parse the user message using GPT-3.5.
Generate a caption for the provided image.
Extract desired colours from the user message.
Perform a similarity search for items with complementary colours, excluding the category of the provided item.
Return the product IDs from various categories with the specified colours.
Custom LangChain Agent Tools Used:
Image captioning tool
Colour extraction from text tool
Image and text vector similarity search tools
CSV referencing tool
Results
Workflow
Image Processing:
Preprocess the images to create vector representations.
Message Parsing:
Utilise GPT-3.5 in chat mode to understand user queries.
Decide on the appropriate application (similarity search or complementary colour search).
Vector Similarity Search:
Perform searches using FAISS for efficient retrieval of similar items.
Filter results based on the user's specifications.
Results:
Visualise results using Matplotlib.
Return product IDs for similar or complementary items.