Home
Publications
Competitions
Hubs
Contributors
Docs
Log in
Sign up
Multimodal AI model based on Vision Transformers, implemented in approximately 500 lines of code