Research Publications (2021 to 2025)

A deep action-oriented video image classification system for text detection and recognition

Document Type

Article

Publication Date

11-1-2021

Abstract

For the video images with complex actions, achieving accurate text detection and recognition results is very challenging. This paper presents a hybrid model for classification of action-oriented video images which reduces the complexity of the problem to improve text detection and recognition performance. Here, we consider the following five categories of genres, namely concert, cooking, craft, teleshopping and yoga. For classifying action-oriented video images, we explore ResNet50 for learning the general pixel-distribution level information and the VGG16 network is implemented for learning the features of Maximally Stable Extremal Regions and again another VGG16 is used for learning facial components obtained by a multitask cascaded convolutional network. The approach integrates the outputs of the three above-mentioned models using a fully connected neural network for classification of five action-oriented image classes. We demonstrated the efficacy of the proposed method by testing on our dataset and two other standard datasets, namely, Scene Text Dataset dataset which contains 10 classes of scene images with text information, and the Stanford 40 Actions dataset which contains 40 action classes without text information. Our method outperforms the related existing work and enhances the class-specific performance of text detection and recognition, significantly.

Keywords

Deep neural networks, Face detection, Action image classification, Text detection, Text recognition

Publication Title

SN Applied Sciences

Recommended Citation

Chaudhuri, Abhra; Shivakumara, Palaiahnakote; Chowdhury, Pinaki Nath; Pal, Umapada; Lu, Tong; Lopresti, Daniel; and Kumar, Govindaraj Hemantha, "A deep action-oriented video image classification system for text detection and recognition" (2021). Research Publications (2021 to 2025). 7319.
https://knova.um.edu.my/research_publications_2021_2025/7319

Divisions

fsktm

Funders

National Natural Science Foundation of China (NSFC)[61672273],FRGS, University of Malaya, Malaysia[FP104-2020],Technology Innovation Hub of Indian Statistical Institute, Kolkata

Volume

Issue

Publisher

Springer

This document is currently not available here.

COinS

Research Publications (2021 to 2025)

A deep action-oriented video image classification system for text detection and recognition

Document Type

Publication Date

Abstract

Keywords

Publication Title

Recommended Citation

Divisions

Funders

Volume

Issue

Publisher

Search

Browse

Author Corner

Research Publications (2021 to 2025)

A deep action-oriented video image classification system for text detection and recognition

Authors

Document Type

Publication Date

Abstract

Keywords

Publication Title

Recommended Citation

Divisions

Funders

Volume

Issue

Publisher

Share

Search

Browse

Author Corner