Fast Dynamic Memory Allocator for Massively Parallel Architectures


Dynamic memory allocation in massively parallel systems often suffers from drastic performance decreases due to the required global synchronization. This is especially true when many allocation or deallocation requests occur in parallel. We propose a method to alleviate this problem by making use of the SIMD parallelism found in most current massively parallel hardware. More specifically, we propose a hybrid dynamic memory allocator operating at the SIMD parallel warp level. Using additional constraints that can be fulfilled for a large class of practically relevant algorithms and hardware systems, we are able to significantly speed-up the dynamic allocation. We present and evaluate a prototypical implementation for modern CUDA-enabled graphics cards, achieving an overall speedup of up to several orders of magnitude.


Fast Dynamic Memory Allocator for Massively Parallel Architectures [Paper]
Sven Widmer, Dominik Wodniok, Nicolas Weber, and Michael Goesele
In: Proceedings of the Sixth Workshop on General Purpose Processing Using GPUs , Houston, USA, 2013

Source Code

FDGMalloc source code (Version 1.1), CUDA implementation [Source]
Version Date Description
1.1 May 13, 2015 Fixed minor bug.
1.0 March 11, 2013 Initial Release.