W-rank: A keyphrase extraction method for webpage based on linguistics and DOM-base features | [VAWKUM Transactions on Computer Sciences • 2023]

Author(s):

1. Himat Shah: School of Computing, University of Eastern Finland,Joensuu,Finland

2. Shafique Ahmed Awan: Department of CS& IT, Benazir Bhutto Shaheed University,Karachi,Sindh,Pakistan

3. Anwar Ali Sathio: Department of CS& IT, Benazir Bhutto Shaheed University,Karachi,Sindh,Pakistan

4. Asadullah Burdi: Institute of Mathematics and Computer Science (IMCS), University of Sindh Jamshoro,Sindh,Pakistan

Abstract:

This paper addresses the problem of an automatic keyphrase extraction for a webpage text. Our method is unsupervised, and we call it W-rank. In our method, first we extract the text of a webpage and tokenize into three different candidate words list: unigram ,bigrams and noun phrases. Then we assign score to all words based on their individual appearance in linguistic and DOM-based feature sets. In the final step, we rank these candidate words using score and select top 5 keyphrase from each list and combine them as a final keyphrases for a given webpage. We focus more on the relevancy of keyphrases to its content using linguistic features. We compare our method with other methods using precision, recall and f-score. The experimental result shows, W-rank improves the performance of our previous method D-rank and outperforms other state of art methods.

Page(s): 217-228

DOI: DOI not available

Published: Journal: VAWKUM Transactions on Computer Sciences, Volume: 11, Issue: 1, Year: 2023

Keywords:

Scoring , Wrank , Automatic keyphrase extraction , relevancy , webpage text , unsupervised method

References:

References are not available for this document.

Citations

Citations are not available for this document.

Citations

Downloads

Views