VisionScribe: Transforming Images into Detailed Narratives

VisionScribe is a web application that leverages advanced AI models to generate detailed captions and narratives from images. Users can upload an image, and the application will provide a caption describing the contents. The generated captions are detailed and contextually enriched using the BLIP (Bootstrapping Language Image Pretraining) model.

Features

Image Upload: Users can upload PNG, JPG, or GIF images.
Automatic Caption Generation: The application generates a basic caption and then enhances it with a detailed narrative using the BLIP model.
Feedback System: Users can provide feedback by liking or disliking the generated caption and submitting additional comments.
User Interface: A simple and responsive frontend built with Tailwind CSS and Alpine.js.

Tech Stack

Backend: Python, Flask
AI Model: BLIP (Salesforce/blip-image-captioning-large)
Database: SQLite
Frontend: HTML, Tailwind CSS, Alpine.js
Image Processing: PIL (Python Imaging Library)
Deep Learning Framework: PyTorch

Requirements

Python 3.8+
pip (Python package manager)
torch
transformers
Flask
Pillow
SQLite (Pre-installed with Python)

Installation

1. Clone the repository

git clone https://github.com/wprashed/visionscribe
cd visionscribe

2. Install dependencies

pip install -r requirements.txt

3. Set up the database

The app uses SQLite to store user feedback. When the app starts, it will automatically create the feedback.db file.

python app.py

4. Start the Flask Application

Run the following command to start the Flask server:

python app.py

The server will start, and you can access the web app at http://127.0.0.1:5000.

Usage

Uploading an Image

Navigate to the web interface at http://127.0.0.1:5000.
Click on the "Click to upload" area or drag and drop an image.
The app will process the image and generate a detailed caption.
You can then provide feedback on the caption by either liking or disliking it and submitting additional comments.

Viewing and Providing Feedback

After the caption is generated, you will have the option to:

Like/Dislike: Provide feedback on the generated caption.
Copy: Copy the caption to your clipboard.
Submit Feedback: If you liked or disliked the caption, you can provide further comments which will be saved to the database.

Database

The application uses SQLite to store user feedback, which includes:

id: A unique identifier for each feedback.
caption: The generated caption.
liked: Whether the user liked the caption (1 for liked, 0 for disliked).
comment: The feedback comment from the user.

API Endpoints

`POST /`

Uploads an image and returns the generated caption.

Request Body (Form Data):

file: The image file to be uploaded.

Response:

{
  "caption": "Generated caption text here"
}

`POST /feedback`

Submits user feedback on the caption.

Request Body (JSON):

{
  "caption": "Generated caption text here",
  "liked": 1,
  "comment": "User's comment here"
}

Response:

{
  "success": true
}

Contributing

Feel free to fork the project, submit issues, and create pull requests. Contributions are welcome!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

The BLIP model is provided by Salesforce for image captioning and has been integrated into this project.
Flask is used to build the backend API.
Tailwind CSS and Alpine.js are used to build the responsive and interactive frontend.

VisionScribe: Transforming Images into Detailed Narratives

Features

Image Upload: Users can upload PNG, JPG, or GIF images.
Automatic Caption Generation: The application generates a basic caption and then enhances it with a detailed narrative using the BLIP model.
Feedback System: Users can provide feedback by liking or disliking the generated caption and submitting additional comments.
User Interface: A simple and responsive frontend built with Tailwind CSS and Alpine.js.

Tech Stack

Backend: Python, Flask
AI Model: BLIP (Salesforce/blip-image-captioning-large)
Database: SQLite
Frontend: HTML, Tailwind CSS, Alpine.js
Image Processing: PIL (Python Imaging Library)
Deep Learning Framework: PyTorch

Requirements

Python 3.8+
pip (Python package manager)
torch
transformers
Flask
Pillow
SQLite (Pre-installed with Python)

Installation

1. Clone the repository

git clone https://github.com/wprashed/visionscribe
cd visionscribe

2. Install dependencies

pip install -r requirements.txt

3. Set up the database

The app uses SQLite to store user feedback. When the app starts, it will automatically create the feedback.db file.

python app.py

4. Start the Flask Application

Run the following command to start the Flask server:

python app.py

The server will start, and you can access the web app at http://127.0.0.1:5000.

Usage

Uploading an Image

Navigate to the web interface at http://127.0.0.1:5000.
Click on the "Click to upload" area or drag and drop an image.
The app will process the image and generate a detailed caption.
You can then provide feedback on the caption by either liking or disliking it and submitting additional comments.

Viewing and Providing Feedback

After the caption is generated, you will have the option to:

Like/Dislike: Provide feedback on the generated caption.
Copy: Copy the caption to your clipboard.
Submit Feedback: If you liked or disliked the caption, you can provide further comments which will be saved to the database.

Database

The application uses SQLite to store user feedback, which includes:

id: A unique identifier for each feedback.
caption: The generated caption.
liked: Whether the user liked the caption (1 for liked, 0 for disliked).
comment: The feedback comment from the user.

API Endpoints

`POST /`

Uploads an image and returns the generated caption.

Request Body (Form Data):

file: The image file to be uploaded.

Response:

{
  "caption": "Generated caption text here"
}

`POST /feedback`

Submits user feedback on the caption.

Request Body (JSON):

{
  "caption": "Generated caption text here",
  "liked": 1,
  "comment": "User's comment here"
}

Response:

{
  "success": true
}

Contributing

Feel free to fork the project, submit issues, and create pull requests. Contributions are welcome!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

The BLIP model is provided by Salesforce for image captioning and has been integrated into this project.
Flask is used to build the backend API.
Tailwind CSS and Alpine.js are used to build the responsive and interactive frontend.