A comparative study of parallel and distributed big data programming models: methodologies, challenges, and future directions | [Lahore Garrison University Research Journal of Computer Science and Information Technology • 2023]

Author(s):

1. Muhammad Wasim: Department of Computer Science, University of Management and Technology, Lahore (Sialkot Campus), Pakistan.

2. Fiza Gulzar Hussain: Department of Computer Science, University of Management and Technology, Lahore (Sialkot Campus), Pakistan.

3. Ayesha Nasir: Department of Computer Science, University of Management and Technology, Lahore (Sialkot Campus), Pakistan.

4. M. Usman Ashraf: Department of Computer Science, GC Women University, Sialkot, Pakistan.

Abstract:

According to a survey conducted in 2021, users share about 4 petabytes of data on Facebook daily. The exponential increase in data (called big data) plays a vital role in machine learning, the Internet of Things (IoT), and business intelligence applications. Due to the rapid increase in big data, research in big data programming models gained much interest in the past decade. Today, many programming paradigms exist to handle big data, and selecting an appropriate model for a project is critical for its success. This study analyzes big data programming models such as MapReduce, Directed Acyclic Graph (DAG), Message Passing Interface (MPI), Bulk Synchronous Parallel (BSP), and SQL. We conduct a comparative study of distributed and parallel big data programming models and categorize these models into three classes: traditional data processing, graph-based processing, and query-based processing models. Furthermore, we evaluate these programming models based on their performance, data processing, storage, fault-tolerant, suitable language, and machine learning support. We highlight the benchmarks with their characteristics used for big data programming models. Finally, we discuss the models' challenges and suggest future directions for the research community.

Page(s): 48-66

DOI: 10.54692/lgurjcsit.2023.073365

Published: Journal: Lahore Garrison University Research Journal of Computer Science and Information Technology, Volume: 7, Issue: 3, Year: 2023

Keywords:

Big Data , Distributed Computing , Directed Acyclic Graph , Parallel computing , Programming Models , SQLlike , Message Passing Interface , Map Reduce , Bulk synchronous Parallel

References:

References are not available for this document.

Citations

Citations are not available for this document.

Citations

Downloads

Views