Efficient label-free pruning and retraining for Text-VQA Transformers
Document Type
Article
Publication Date
7-1-2024
Abstract
Recent advancements in Scene Text Visual Question Answering (Text-VQA) employ autoregressive Transformers, showing improved performance with larger models and pre -training datasets. Although various pruning frameworks exist to simplify Transformers, many are integrated into the time-consuming training process. Researchers have recently explored post -training pruning techniques, which separate pruning from training and reduce time consumption. Some methods use gradient -based importance scores that rely on labeled data, while others offer retraining -free algorithms that quickly enhance pruned model accuracy. This paper proposes a novel gradient -based importance score that only necessitates raw, unlabeled data for post -training structured autoregressive Transformer pruning. Additionally, we introduce a Retraining Strategy (ReSt) for efficient performance restoration of pruned models of arbitrary sizes. We evaluate our approach on TextVQA and ST-VQA datasets using TAP, TAP dagger dagger and SaL double dagger- Base where all utilize autoregressive Transformers. On TAP and TAP dagger dagger , our pruning approach achieves up to 60% reduction in size with less than a 2.4% accuracy drop and the proposed ReSt retraining approach takes only 3 to 34 min, comparable to existing retraining -free techniques. On SaL double dagger- Base, the proposed method achieves up to 50% parameter reduction with less than 2.9% accuracy drop requiring only 1.19 h of retraining using the proposed ReSt approach. The code is publicly accessible at https://github.com/soonchangAI/LFPR.
Keywords
Transformer, Pruning, Scene text visual question answering
Divisions
fsktm
Publication Title
Pattern Recognition Letters
Volume
183
Publisher
Elsevier
Publisher Location
RADARWEG 29, 1043 NX AMSTERDAM, NETHERLANDS