Lattice Boltzmann Method (LBM) is a powerful numerical simulation method of the fluid flow. With its data parallel nature, it is a promising candidate for a parallel implementation on a GPU. The LBM, however, is heavily data intensive and memory bound. In particular, moving the data to the adjacent cells in the streaming computation phase incurs a lot of uncoalesced accesses on the GPU which affects the overall performance. Furthermore, the main computation kernels of the LBM use a large number of registers per thread which limits the thread parallelism available at the run time due to the fixed number of registers on the GPU. In this paper, we develop high performance parallelization of the LBM on a GPU by minimizing the overheads associated with the uncoalesced memory accesses while improving the cache locality using the tiling optimization with the data layout change. Furthermore, we aggressively reduce the register uses for the LBM kernels in order to increase the run-time thread parallelism. Experimental results on the Nvidia Tesla K20 GPU show that our approach delivers impressive throughput performance: 1210.63 Million Lattice Updates Per Second (MLUPS).
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2iZGIcv
via IFTTT
Εγγραφή σε:
Σχόλια ανάρτησης (Atom)
Δημοφιλείς αναρτήσεις
-
Abstract Purpose To test the effects of 4 weeks of unilateral low-load resistance training (LLRT), with and without blood flow restricti...
-
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2juls25 via IFTTT
-
Swedish medical imaging software developer SyntheticMR has received the CE... Read more on AuntMinnieEurope.com Related Reading: Siemen...
-
Long-term clinical outcomes and economic evaluation of the ketogenic diet versus care as usual in children and adolescents with intract...
-
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2qoeMDm via IFTTT
-
by Luanluan Li, Li Hua, Yafang He, Yixiao Bao Epidemiological evidence suggests that formaldehyde (FA) exposure may influence the prevalenc...
-
Relativistic hydrodynamics has been quite successful in explaining the collective behaviour of the QCD matter produced in high energy heavy-...
-
ACS Nano DOI: 10.1021/acsnano.6b06114 from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2kOsUGq via...
-
Immunotherapy for metastatic melanoma has a decades-long history, and the relatively recent use of checkpoint inhibitors has revolutionized...
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου