Dynamic Data Extractor is a versatile Command Line Interface (CLI) tool designed to extract structured data from various sources such as text, images, and bulk files. Leveraging Ollama's language models, this tool provides flexibility and efficiency in data extraction tasks.
In this project, I am using llama3.2:3b and llama3.2:11b vision from ollama.
Step 1: Clone the Repository
To get started, clone the repository to your local machine:
git clone https://github.com/iammuhammadnoumankhan/AI-DataParser.git cd AI-DataParser
Step 2: Create a Virtual Environment
Set up a virtual environment to manage dependencies:
python3 -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
Step 3: Install Dependencies
Install the required dependencies using pip:
pip install -r requirements.txt
Step 4: Configure .env
Modify the .env file with your Ollama server settings to ensure proper connectivity.
I will Recommend to watch this video on my youtube: Youtube
Basic Usage
Extracting Data from Text
To extract data from a text input:
python cli.py --text "Your input text"
Extracting Data from an Image
To extract data from an image:
python cli.py --image path/to/image.jpg
Extracting Data from Bulk Text Files
To extract data from multiple text files:
python cli.py --bulk-text path/to/data.csv
Extracting Data from Bulk Images
To extract data from multiple images in a folder:
python cli.py --bulk-images path/to/image/folder
--display
: Choose the display format (json/table/none)--export
: Choose the export format (json/csv/none)For extracting structure data from unstructure data, you will need to define your own filters, for refererence you can watch the my video on YouTube regarding this project.
Youtube
Filters like:
Thank You!!!