Pakistan Science Abstracts
Article details & metrics
No Detail Found!!
W-rank: A keyphrase extraction method for webpage based on linguistics and DOM-base features
Author(s):
1. Himat Shah: School of Computing, University of Eastern Finland,Joensuu,Finland
2. Shafique Ahmed Awan: Department of CS& IT, Benazir Bhutto Shaheed University,Karachi,Sindh,Pakistan
3. Anwar Ali Sathio: Department of CS& IT, Benazir Bhutto Shaheed University,Karachi,Sindh,Pakistan
4. Asadullah Burdi: Institute of Mathematics and Computer Science (IMCS), University of Sindh Jamshoro,Sindh,Pakistan
Abstract:
This paper addresses the problem of an automatic keyphrase extraction for a webpage text. Our method is unsupervised, and we call it W-rank. In our method, first we extract the text of a webpage and tokenize into three different candidate words list: unigram ,bigrams and noun phrases. Then we assign score to all words based on their individual appearance in linguistic and DOM-based feature sets. In the final step, we rank these candidate words using score and select top 5 keyphrase from each list and combine them as a final keyphrases for a given webpage. We focus more on the relevancy of keyphrases to its content using linguistic features. We compare our method with other methods using precision, recall and f-score. The experimental result shows, W-rank improves the performance of our previous method D-rank and outperforms other state of art methods.
Page(s): 217-228
DOI: DOI not available
Published: Journal: VAWKUM Transactions on Computer Sciences, Volume: 11, Issue: 1, Year: 2023
Keywords:
Scoring , Wrank , Automatic keyphrase extraction , relevancy , webpage text , unsupervised method
References:
References are not available for this document.
Citations
Citations are not available for this document.
0

Citations

0

Downloads

45

Views