Pakistan Science Abstracts
Article details & metrics
No Detail Found!!
An approach based on iterative learning algorithm for Chinese text hierarchy feature extraction without lexicon.
Author(s):
1. Shaohua Jiang: Faculty of infrastructure engineering, Dalian University of Technology, Liaoning, China
Abstract:
A great deal of information included in Chinese text is invaluable asset for further text mining, but the difference between Chinese and the western languages imposes restrictions on further utilization of Chinese text. No distinction indication between words by using spaces is one of the major differences between Chinese, also some other Asian languages, such as Japanese, Thai, etc., and Western languages. Chinese segmentation and features extraction is essential in Chinese natural language processing because it is a precondition for further Chinese text information retrieval and knowledge discovery. Maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics is an effective segmentation and extraction method for Chinese words and phrases, but there are still some shorter words and phrases included in the longer ones extracted by MMFS can’t be obtained. In order to solve this problem, this paper presents a novel Chinese hierarchy feature extraction method combined MMFS with iterative learning algorithm. This method can extract hierarchy feature according to morphology with no need for lexicon support, no need for acquiring the probability between words in advance and no need for Chinese character index. Experimental results confirm the efficiency of this statistical method in extracting Chinese hierarchy feature. This method is also beneficial to feature extraction for other Asian languages similar to Chinese.
Page(s): 214-221
DOI: DOI not available
Published: Journal: Journal of Theoretical and Applied Information Technology, Volume: 49, Issue: 1, Year: 2013
Keywords:
Keywords are not available for this article.
References:
References are not available for this document.
Citations
Citations are not available for this document.
0

Citations

0

Downloads

3

Views