Sequential optimization and shared and distributed memory parallelization in clusters : n-body/particle simulation
Tinetti, Fernando Gustavo
Sequential optimization and shared and distributed memory parallelization in clusters : n-body/particle simulation - 1 archivo (532 KB)
Formato de archivo: PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca)
The particle-particle method for N-Body problems is one of the most commonly used methods in computer driven physics simulation. These algorithms are, in general, very simple to design and code, and highly parallelizable. In this article, we present the most important approaches for the application of the three performance improvement areas on these algorithms when executed on high performance computing (HPC) clusters: 1) sequential optimization (a single core in a node of the cluster), 2) shared memory parallelism (in a single node with multiple CPUs available, just like a multiprocessor), and 3) distributed memory parallelism (in the whole cluster). For each one of the improvement areas we present the employed techniques and the obtained performance gain. Also, we will show how some (sequential/classical) code optimizations are almost essential for obtaining at least acceptable parallel performance and scalability.
DIF-M6647
COMPUTACIÓN PARALELA
OPTIMIZACIÓN
GENERACIÓN DE CÓDIGO
Sequential optimization and shared and distributed memory parallelization in clusters : n-body/particle simulation - 1 archivo (532 KB)
Formato de archivo: PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca)
The particle-particle method for N-Body problems is one of the most commonly used methods in computer driven physics simulation. These algorithms are, in general, very simple to design and code, and highly parallelizable. In this article, we present the most important approaches for the application of the three performance improvement areas on these algorithms when executed on high performance computing (HPC) clusters: 1) sequential optimization (a single core in a node of the cluster), 2) shared memory parallelism (in a single node with multiple CPUs available, just like a multiprocessor), and 3) distributed memory parallelism (in the whole cluster). For each one of the improvement areas we present the employed techniques and the obtained performance gain. Also, we will show how some (sequential/classical) code optimizations are almost essential for obtaining at least acceptable parallel performance and scalability.
DIF-M6647
COMPUTACIÓN PARALELA
OPTIMIZACIÓN
GENERACIÓN DE CÓDIGO