Unicode aided language identification across multiple scripts and heterogeneous data. | [Information technology Journal • 2007]

Author(s):

1. Farheen Hanif: Department of Computer Science, International Islamic University Islamabad, Pakistan

2. Fouzia Latif: Department of Computer Science, International Islamic University Islamabad, Pakistan

3. M. Sikandar Hayat Khiyal: Department of Computer Science, International Islamic University Islamabad, Pakistan

Abstract:

With growing explosion of multi-lingual data on the Internet and other informational and communicational fields, the requirement of having effective automated language identifiers has increased further. More information finds its way into the computer systems and the web and using manual methods to categorize the information is becoming increasingly in feasible. In this study we discuss improvements we have achieved in existing language identification methods. Couple of new areas that were not explored before is the inclusion of non-Roman scripts and active usage of Unicode information about scripts to enhance the language detection process.

Page(s): 534-540

DOI: DOI not available

Published: Journal: Information technology Journal, Volume: 6, Issue: 4, Year: 2007

Keywords:

Keywords are not available for this article.

References:

References are not available for this document.

Citations

Citations are not available for this document.

Citations

Downloads

Views