Normal view MARC view ISBD view

Blocked all-pairs shortest paths algorithm on Intel Xeon Phi KNL processor : a case study

By:

Rucci, Enzo

Contributor(s):

Material type: Article

ArticleDescription: 1 archivo (1,4 MB)Subject(s):

Summary: Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound ap- plications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.

Average rating: 0.0 (0 votes)

Holdings ( 1 )
Title notes ( 3 )

Holdings
Item type	Home library	Collection	Call number	URL	Status	Date due	Barcode
Capítulo de libro	Biblioteca de la Facultad de Informática	Biblioteca digital	A0926 (Browse shelf(Opens below))	Link to resource	Recurso en Línea

Formato de archivo PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca)

Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound ap- plications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.

Congreso Argentino de Ciencias de la Computación (23ro : 2017 : La Plata, Argentina)