Abstract:
Rapid growth of digital data with complex content has led to various challenges in processing. Exponential increase in the size of 'Big Data' due to videos, audios, images and textual content has created several problems which need to be addressed by the research community. Currently, huge amount of digital data is generated by various sources. The high quality data require more space and consume excessive bandwidth during transmission. To overcome these issues, digital data are stored in compressed form using different compression algorithms stated in literature. In order to analyze these data traditional schemes use decompression techniques which are a time consuming process and increases the computation overhead. To overcome these issues, currently compressed domain image processing techniques have been adopted where complete decompression may not be required. In this work, we adopt document image processing in compressed domain which contains printed text in the document images. Our main aim is to identify the similarity and find the equivalence between two or more compressed document images. In order to achieve this, first of all, we apply JPEG encoding which generates encoded data. This data further processed through the proposed line, word and character segmentation scheme. Further, we apply SIFT (ScaleInvariant Feature Transform) to extract the feature from compressed domain segmented data. Finally, feature matching scheme is applied which uses Brute force feature matcher and k-nearest neighbor. We have tested this approach on publically available PubLayNet, IIIT-AR-13K, and Tobacco-3482 datasets which contains large scale document images. The experimental analysis shows the robustness of proposed approach to identify the similarity between compressed documents images.
Page(s):
5401-5417
DOI:
DOI not available
Published:
Journal: Journal of Theoretical and Applied Information Technology, Volume: 100, Issue: 17, Year: 2022
Keywords:
JPEG
,
Compressed Domain
,
KNN
,
SIFT
,
Content Equivalence
,
Brute force
,
Document processing
,
Compressed Document Images CDI