Σάββατο 27 Μαΐου 2017

Distributed stream join under workload variance

Abstract

Flexible and self-adaptive stream join processing plays an important role in a parallel shared-nothing environments. Join-Matrix model is a high-performance model which is resilient to data skew and supports arbitrary join predicates for taking random tuple distribution as its routing policy. To maximize system throughputs and minimize network communication cost, a scalable partitioning scheme on matrix is critical. In this paper, we present a novel flexible and adaptive scheme partitioning model for stream join operator, which ensures high throughput but with economical resource usages by allocating resources on demand. Specifically, a lightweight scheme generator, which requires the sample of each stream volume and processing resource quota of each physical machine, generates a join scheme; then a migration plan generator decides how to migrate data among machines under the consideration of minimizing migration cost while ensuring correctness. We do extensive experiments on different kinds of join workloads and the evaluation shows high competence comparing with baseline systems on benchmark data and real data.



from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2jW6P7P
via IFTTT

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου

Δημοφιλείς αναρτήσεις