Abstract:
Content-based image retrieval is an active research area because of the vast applications of image and video collections. Text embedded in the video can provide a significant contribution to identifying the contents of the multimedia data and facilitate the process of video indexing, retrieval, and analysis. Text tracking is a vital part of the text extraction process. It can speed up the text extraction process and also improves the text localization accuracy for videos. This paper proposes a novel approach for text tracking in videos. This method experimentally proposes Maximally Stable Extremal Regions (MSER) and Speeded up Robust Features (SURF) descriptors for text tracking in videos. The proposed method improves the tracking accuracy and efficiency of video text objects. MSER is used for interest point detection as Extremal regions are affine invariant, so it is a suitable option for text tracking which can undergo scale, rotation, and translation changes. SURF is chosen as a feature descriptor because of its scale and rotation invariance, speed, and robustness. The proposed technique is testified on two datasets; the first is designed to test the tracking methodology as part of this research and the second is a publicly available Youtube Video Text (YVT) dataset. It provides Multiple Object Tracking Precision (MOTP) of 0.53 and Multiple Object Tracking Accuracy (MOTA) of 0.48. Succeeding analysis on diverse datasets endures verification of the fact that the projected technique demonstrates visible improvements to text tracking in videos.
Page(s):
39-47
DOI:
DOI not available
Published:
Journal: Journal of Information Communication Technologies and Robotic Applications, Volume: 12, Issue: 2, Year: 2021
Keywords:
Document Analysis
,
Text tracking
,
Caption Test
,
video retrieval