An efficient implementation of apriori algorithm based on hadoop-mapreduce model. | [International Journal of Reviews in Computing • 2012]

Author(s):

1. Othman Yahya: Faculty of Computers and Information, Cairo University, Cairo, Egypt

2. Osman Hegazy: Faculty of Computers and Information, Cairo University, Cairo, Egypt

3. Ehab Ezat: Faculty of Computers and Information, Cairo University, Cairo, Egypt

Abstract:

Finding frequent itemsets is one of the most important fields of data mining. Apriori algorithm is the most established algorithm for finding frequent itemsets from a transactional dataset; however, it needs to scan the dataset many times and to generate many candidate itemsets. Unfortunately, when the dataset size is huge, both memory use and computational cost can still be very expensive. In addition, single processor’s memory and CPU resources are very limited, which make the algorithm performance inefficient. Parallel and distributed computing are effective strategies for accelerating algorithms performance. In this paper, we have implemented an efficient MapReduce Apriori algorithm (MRApriori) based on Hadoop-MapReduce model which needs only two phases (MapReduce Jobs) to find all frequent k-itemsets, and compared our proposed MRApriori algorithm with current two existed algorithms which need either one or k phases (k is maximum length of frequent itemsets) to find the same frequent k-itemsets. Experimental results showed that the proposed MRApriori algorithm outperforms the other two algorithms.

Page(s): 59-67

DOI: DOI not available

Published: Journal: International Journal of Reviews in Computing, Volume: 12, Issue: 0, Year: 2012

Keywords:

Keywords are not available for this article.

References:

References are not available for this document.

Citations

Citations are not available for this document.

Citations

Downloads

Views