Pakistan Science Abstracts
Article details & metrics
No Detail Found!!
An improved text classification method based on gini index.
Author(s):
1. Xiaoqiang Jia: School of Mathematics and Information Science, Weinan Normal University, Weinan 714000, Shaanxi, China
2. Jiangyan Sun: Modern Education Technology Center, Xi’an International University, Xi’an, Shaanxi, 710077, China
Abstract:
In text classification, the purity of the Gini index can be used. When purity value is greater, the characteristic of the information contained in the attribute is higher, and the feature distinguishing capability is stronger. But using the Gini purity formula on feature weight, the classification result is not very good, one of the main reasons is those rare words only appearing in one category and not appearing in other categories cannot be strictly differentiated. In order to solve this problem, On the basis of Gini index, an improved feature weight method based on Gini index has proposed. By introducing the approximation quality of features term in the categories, according to the category distinguishing ability adjust term weight, using the purity formula feature weight comparison, the above problem is well solved, which can effectively improve the performance of text classification. The experiments have verified the feasibility of the proposed method.
Page(s): 267-273
DOI: DOI not available
Published: Journal: Journal of Theoretical and Applied Information Technology, Volume: 43, Issue: 2, Year: 2012
Keywords:
Keywords are not available for this article.
References:
References are not available for this document.
Citations
Citations are not available for this document.
0

Citations

0

Downloads

3

Views