An image-to-text agent using NLP and Llama 3.2 11B Vision Model.
The agent as an expert English teacher will analyze the image file, extract keywords, group them semantically, and craft concise sentences demonstrating correct usage.
Extract keywords
List up a few sample words on UI
=> Website
generated with DocToc
Automate the contract review process through the following steps:
Image Upload:
Retrieve Key Information:
AI-Powered Vocabulary List:
User Interaction:
[data-doc-management]
[ai-model-curation]
[task-handling]
[deployment-framework]
Python: Primary programming language. We use ver 3.12
Flask: Web framework for the backend API
Flask Cors: A Flask extension for handling Cross Origin Resource Sharing (CORS), making cross-origin AJAX possible
pipenv: Python package manager
pre-commit: Managing and maintaining pre-commit hooks
React: Frontend framework
Vercel: User endpoint
.
├── __init__.py
├── app.py # Flask application
├── agents.py # Define the AI agents
├── Prompts/ # Store prompt and system context templates
│ ├── System.py
│ └── User.py
│ └── ...
├── db/ # Database files
│ ├── chroma.sqlite3
│ └── ...
└── sample_textbook_images/ # Sample textbook images for the test
└── uploads/ # Uploaded image files
Install the pipenv
package manager:
pip install pipenv
Install dependencies:
pipenv shell
pipenv install -r requirements.txt -v
pip install -r requirements.txt
Set up environment variables:
Create a .env
file in the project root and add the following:
TOGETHER_API_KEY=your_together_api_key
Test the AI assistant:
pipenv shell
python main.py
In the terminal, you can trace the process analyzing the sample textbook data.
Start the Flask backend:
python -m flask run --debug
The backend will be available at http://localhost:5000
.
In a separate terminal, run the React frontend app:
cd frontend
npm start
The frontend will be available at http://localhost:3000
.
Call the Flask API from the frontend app to see the result on user interface.
pipenv install <package>
pipenv uninstall <package>
pipenv run <command>
After adding/removing the package, update requirements.txt
accordingly or run pip freeze > requirements.txt
to reflect the changes in dependencies.
To reinstall all the dependencies, delete Pipfile
and Pipfile.lock
, then run:
pipenv shell
pipenv install -r requirements.txt -v
Install pre-commit hooks:
pipenv run pre-commit install
Run pre-commit checks manually:
pipenv run pre-commit run --all-files
Pre-commit hooks help maintain code quality by running checks for formatting, linting, and other issues before each commit.
*To skip pre-commit hooks
git commit --no-verify -m "your-commit-message"
To modify or add new AI agents, edit the agents.py
file. Each agent is defined with a specific role, goal, and set of tools.
To modify or add templated prompts, edit/add files to the Prompts
folder.
Chain of thought technique
as well as Role based prompting
.The system uses Chroma DB to store and query the images uploaded. To update the knowledge base:
uploads/
directory.agents.py
file to update the ingestion process if necessary.git checkout -b feature/your-amazing-feature
)git commit -m 'Add your-amazing-feature'
)git push origin feature/your-amazing-feature
)Common issues and solutions:
.env
file are correct and up to date.output.log
file for detailed error messages and stack traces.There are no models linked