Efficient label-free pruning and retraining for Text-VQA Transformers

Document Type

Article

Publication Date

7-1-2024

Abstract

Recent advancements in Scene Text Visual Question Answering (Text-VQA) employ autoregressive Transformers, showing improved performance with larger models and pre -training datasets. Although various pruning frameworks exist to simplify Transformers, many are integrated into the time-consuming training process. Researchers have recently explored post -training pruning techniques, which separate pruning from training and reduce time consumption. Some methods use gradient -based importance scores that rely on labeled data, while others offer retraining -free algorithms that quickly enhance pruned model accuracy. This paper proposes a novel gradient -based importance score that only necessitates raw, unlabeled data for post -training structured autoregressive Transformer pruning. Additionally, we introduce a Retraining Strategy (ReSt) for efficient performance restoration of pruned models of arbitrary sizes. We evaluate our approach on TextVQA and ST-VQA datasets using TAP, TAP dagger dagger and SaL double dagger- Base where all utilize autoregressive Transformers. On TAP and TAP dagger dagger , our pruning approach achieves up to 60% reduction in size with less than a 2.4% accuracy drop and the proposed ReSt retraining approach takes only 3 to 34 min, comparable to existing retraining -free techniques. On SaL double dagger- Base, the proposed method achieves up to 50% parameter reduction with less than 2.9% accuracy drop requiring only 1.19 h of retraining using the proposed ReSt approach. The code is publicly accessible at https://github.com/soonchangAI/LFPR.

Keywords

Transformer, Pruning, Scene text visual question answering

Divisions

fsktm

Publication Title

Pattern Recognition Letters

Volume

183

Publisher

Elsevier

Publisher Location

RADARWEG 29, 1043 NX AMSTERDAM, NETHERLANDS

This document is currently not available here.

Share

COinS