The repo demonstrates the use of technologies such as Docker containerization and NVIDIA Triton Inference Server for image captioning with transformer-based models available from Hugging Face. A simple front-end based on jQuery interacts with the Python-based back-end via Flask. Vision Language Models (VLMs) are served for inference by Triton using a Python-backend approach.
There are no datasets linked
There are no datasets linked