Lattice Boltzmann Method (LBM) is a powerful numerical simulation method of the fluid flow. With its data parallel nature, it is a promising candidate for a parallel implementation on a GPU. The LBM, however, is heavily data intensive and memory bound. In particular, moving the data to the adjacent cells in the streaming computation phase incurs a lot of uncoalesced accesses on the GPU which affects the overall performance. Furthermore, the main computation kernels of the LBM use a large number of registers per thread which limits the thread parallelism available at the run time due to the fixed number of registers on the GPU. In this paper, we develop high performance parallelization of the LBM on a GPU by minimizing the overheads associated with the uncoalesced memory accesses while improving the cache locality using the tiling optimization with the data layout change. Furthermore, we aggressively reduce the register uses for the LBM kernels in order to increase the run-time thread parallelism. Experimental results on the Nvidia Tesla K20 GPU show that our approach delivers impressive throughput performance: 1210.63 Million Lattice Updates Per Second (MLUPS).
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2iZGIcv
via IFTTT
Εγγραφή σε:
Σχόλια ανάρτησης (Atom)
Δημοφιλείς αναρτήσεις
-
To evaluate the effect of Recurrence Score® results (RS; Oncotype DX® multigene assay ODX) on treatment recommendations by Swiss multidiscip...
-
Abstract Objective To evaluate Chinese medicine (CM) formula Bazheng Powder (八正散) as an alternative therapeutic option for female patients...
-
Abstract Purpose Overcoming the flaws of current data management conditions in head and neck oncology could enable integrated informatio...
-
Abstract Soil conditioners can be used to compensate for the insufficient soil nutrition and organic matter (OM) of arable soils. However, ...
-
Caring for Patients with Physical Disabilities: Assessment of an Innovative Spinal Cord Injury Session that Addresses an Educational Gap Des...
-
Ocular Vestibular Evoked Myogenic Potentials: Where Are We Now? Objective: Over the last decade, ocular vestibular evoked myogenic potential...
-
Abstract Objective To study the effects of Astragalus polysaccharide (APS), the primary effective component of the Chinese herb medicine A...
-
Publication date: Available online 6 January 2018 Source: Current Problems in Diagnostic Radiology Author(s): Mark D. Kovacs, Maximilian...
-
Background Chronic rhinosinusitis (CRS) is a commonly observed sequela after radiation therapy to the paranasal sinuses. The histopatholo...
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου