The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2ogdLft
via IFTTT
Εγγραφή σε:
Σχόλια ανάρτησης (Atom)
Δημοφιλείς αναρτήσεις
-
Publication date: Available online 4 January 2018 Source: European Journal of Radiology Author(s): Peiyao Zhang, Jing Wang, Qin Xu, Zhen...
-
Background Hyperthyroidism is associated with increased thrombotic risk. As contact system activation through formation of neutrophil extrac...
-
Abstract Over the past decades, an increasing need in renewable resources has progressively appeared. This trend concerns not only fossil f...
-
Zeinab Nazeeh Shata, Marwa R Amin, Heba M El-Kady, Mervat W Abu-Nazel Avicenna Journal of Medicine 2017 7(2):54-63 Background: Unlike ot...
-
Brain Networks are Independently Modulated by Donepezil, Sleep, and Sleep Deprivation. Brain Topogr. 2017 Nov 23;: Authors: Wirsich J...
-
Abstract Diphenylarsinic acid (DPAA) is an organic arsenic compound used for the synthesis of chemical weapons. We previously found that th...
-
Vol.10 No.8 from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/22FyVm0 via IFTTT
-
Whether to wear a pollution filter Development of air quality forecasting system in Macedonia, based on WRF-Chem model Abstract Urban air qu...
-
from Imaging via alkiviadis.1961 on Inoreader http://ift.tt/2hMrBnH
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου