Author(s):
1. Sulaiman Khan:
Department of Computer Science, University of Swabi, Pakistan
2. Shah Nazir:
Department of Computer Science, University of Swabi, Pakistan
Abstract:
In artificial intelligence, text identification and analysis that are based on images play a vital role in the text retrieving process. Automatic text recognition system development is a dificult task in machine learning, but in the case of cursive languages, it poses a big challenge to the research community due to slight changes in character's shapes and the unavailability of a standard dataset. While this recognition task becomes more challenging in the case of Pashto language due to a large number of characters in its dataset than other similar cursive languages (Persian, Urdu, Arabic) and a slight change in character's shape. This paper aims to address accept these challenges by developing an optimal optical character recognition (OCR) system to recognise isolated handwritten Pashto characters. The proposed OCR system is developed using multiple long short-term memory (LSTM) based deep learning model. The applicability of the proposed model is validated by using the decision trees (DT) classification tool based on the zoning feature extraction technique and the invariant moment approaches. An overall accuracy rate of 89.03% is calculated for the multiple LSTM-based OCR system while DT-based recognition rate of 72.9% is achieved using zoning feature vector and 74.56% is achieved for invariant moments-based feature map. Applicability of the system is evaluated using different performance metrics of accuracy, f-score, specificity, and varying training and test sets.
Page(s):
49-58
DOI:
DOI not available
Published:
Journal: Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences, Volume: 58, Issue: 2, Year: 2021
Keywords:
deep learning
,
Pashto
,
Long Short Term Memory
,
Optical Characters Recognition
,
Decision Trees
,
Invariant Moments
,
Zoning Technique
References:
References are not available for this document.
Citations
Citations are not available for this document.