Pakistan Science Abstracts
Article details & metrics
No Detail Found!!
Unicode aided language identification across multiple scripts and heterogeneous data.
Author(s):
1. Farheen Hanif: Department of Computer Science, International Islamic University Islamabad, Pakistan
2. Fouzia Latif: Department of Computer Science, International Islamic University Islamabad, Pakistan
3. M. Sikandar Hayat Khiyal: Department of Computer Science, International Islamic University Islamabad, Pakistan
Abstract:
With growing explosion of multi-lingual data on the Internet and other informational and communicational fields, the requirement of having effective automated language identifiers has increased further. More information finds its way into the computer systems and the web and using manual methods to categorize the information is becoming increasingly in feasible. In this study we discuss improvements we have achieved in existing language identification methods. Couple of new areas that were not explored before is the inclusion of non-Roman scripts and active usage of Unicode information about scripts to enhance the language detection process.
Page(s): 534-540
DOI: DOI not available
Published: Journal: Information technology Journal, Volume: 6, Issue: 4, Year: 2007
Keywords:
Keywords are not available for this article.
References:
References are not available for this document.
Citations
Citations are not available for this document.
0

Citations

0

Downloads

9

Views