Pakistan Science Abstracts
Article details & metrics
No Detail Found!!
Text Clusters Labeling using WordNet and Term Frequency- Inverse Document Frequency
Author(s):
1. Syed Muhammad Saqlain: International Islamic University, DCS&SE,Islamabad,Pakistan
2. Asif Nawaz: International Islamic University, DCS&SE,Islamabad,Pakistan
3. Imran Khan: International Islamic University, DCS&SE,Islamabad,Pakistan
4. Faiz Ali Shah: University of Tartu,Ulikooli Tartu.,Estonia
5. Muhammad Usman Ashraf: International Islamic University, DCS&SE,Islamabad,Pakistan
Abstract:
Cluster Labeling is the process of assigning appropriate and well descriptive titles to text documents. The most suitable label not only explains the central theme of a particular cluster but also provides a means to differentiate it from other clusters in an efficient way. In this paper we proposed a technique for cluster labeling which assigns a generic label to a cluster that may or may not be a part of the text document cluster. It finds the theme of a document and designates it as its label. We used Term Frequency and Inverse Document frequency at baseline for tf-idf, with the Term Frequency calculation refined by using a thesaurus. WordNet was used as an external resource for hypernym generation of the terms having the K-Highest tf-idf. The hypernyms with the highest frequency are then taken as the label of the cluster. The details of the datasets used for experimentation and the comparative results with existing methods are provided in the paper, and clearly reflects the meaningful outcome of our technique.
Page(s): 281-291
DOI: DOI not available
Published: Journal: Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences, Volume: 53, Issue: 3, Year: 2016
Keywords:
Clustering , thesaurus , WordNet , cluster labeling
References:
References are not available for this document.
Citations
Citations are not available for this document.
0

Citations

0

Downloads

45

Views