For details see Powerful SeqAttention for Compact Convolutional Transformer by Hwasik Jeong and Jongbin Ryu.
As convolution and transformer architectures have advanced, performance on classification tasks has steadily improved, even surpassing human capabilities.
During this development, the limitation of transformers being effective only with large datasets was addressed by the introduction of CCT, a hybrid version of convolution and transformer architectures.
CCT opened up the possibility for transformers to perform well on small datasets.
The sequence pool used in CCT achieved 76.93% top-1 accuracy on the CIFAR-100 dataset by learning various features, but it was found to lack feature diversity to further enhance performance.
This study reveals that gramian attention is much more effective at learning diverse features compared to the original sequence pooling, achieving 80.52% on the CIFAR-100 dataset with 8 heads.
Taking this a step further, this study proposes a new head architecture called SeqAttention, which is much more lightweight and powerful compared to the gramian attention head.
This study proposes a new head, called 🔥'SeqAttention'🔥, which outperforms both sequence pooling and gramian attention.
This head architecture integrates sequence pooling with an attention mechanism, as illustrated below.
SeqAttention head in CCT (SeqAttn1-CCT-7/3x1) not only surpasses both the original CCT and CCT with a single Gramian attention head, but also does so with fewer parameters compared to GA1-CCT-7/3x1.
The results below present the Top-1 and Top-5 accuracy on the CIFAR-100 and ImageNet datasets, along with the total number of parameters.
data/
cifar-100-python/
meta
test # 10,000 validation images
train # 50,000 training images
imageNet/
train # 1,281,167 training images
val # 50,000 validation images
CUDA_VISIBLE_DEVICES=0 python3 multi_train.py cifar100 -m seqattn1_cct_7_3x1_32
CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nproc-per-node=2 multi_train.py imagenet -m cct_7_3x1_32
There are no datasets linked
There are no datasets linked