Efficient heap-manager for heavy churn, tiny allocs?

https://stackoverflow.com/questions/228081

03-07-2019
|

Question

I'm looking for ideas for a heap-manager to handle a very specific situation: Lots and lots of very small allocations, ranging from 12 to 64 bytes each. Anything bigger, I will pass on to the regular heap-manager, so only tiny blocks need be catered for. Only 4-byte alignment is needed.

My main concerns are

Overhead. The regular libc heap will typically round up an allocation to a multiple of 16 bytes, then add another 16 byte header - this means over 50% overhead on a 20-byte allocation, which sucks.
Performance

One helpful aspect is that Lua (which is the user of this heap) will tell you the size of the block it's freeing when it calls free() - this may enable certain optimisations.

I'll post my current approach, which works ok, but I'd like to improve on it if at all possible. Any ideas?

Solution

It is possible to build a heap manager that is very efficient for objects that are all the same size. You could create one of these heaps for each size of object that you need, or if you don't mind using a bit of space, create one for 16 byte objects, one for 32, and one for 64. The maximum overhead would be 31 bytes for a 33 byte allocation (which would go on the 64 blocksize heap).

OTHER TIPS

To expand on what Greg Hewgill says, one way to do an ultra-efficient fixed-size heap is:

Split a big buffer into nodes. Node size must be at least sizeof(void*).
String them together into a singly-linked list (the "free list"), using the first sizeof(void*) bytes of each free node as a link pointer. Allocated nodes will not need a link pointer, so per-node overhead is 0.
Allocate by removing the head of the list and returning it (2 loads, 1 store).
Free by inserting at the head of the list (1 load, 2 stores).

Obviously step 3 also has to check if the list's empty, and if so do a bunch of work getting a new big buffer (or fail).

Even more efficient, as Greg D and hazzen say, is to allocate by incrementing or decrementing a pointer (1 load, 1 store), and not offer a way to free a single node at all.

Edit: In both cases, free can deal with the complication "anything bigger I pass on the regular heap-manager" by the helpful fact that you get the size back in the call to free. Otherwise you'd be looking at either a flag (overhead probably 4 bytes per node) or else a lookup in some kind of record of the buffer(s) you've used.

The answer may depend on the lifetime patterns for these objects. If the objects are all instantiated as you proceed, and then all removed in one fell swoop, it may make sense to create a very simple heap manager that allocates memory by simply incrementing a pointer. Then, when you're done, blow away the entire heap.

Raymond Chen made an interesting post that may help to inspire you. :)

I like onebyones answer.

You might also consider The buddy system for your sets of fixed size heaps.

If a bunch of memory is allocated, used, and freed before moving on to the next round of allocation, I'd suggest using the simplest allocator possible:

typedef struct _allocator {
    void* buffer;
    int start;
    int max;
} allocator;

void init_allocator(size_t size, allocator* alloc) {
    alloc->buffer = malloc(size);
    alloc->start = 0;
    alloc->max = size;
}

void* allocator_malloc(allocator* alloc, size_t amount) {
    if (alloc->max - alloc->start < 0) return NULL;
    void* mem = alloc->buffer + alloc->start;
    alloc->start += bytes;
    return mem;
}

void allocator_free(allocator* alloc) {
    alloc->start = 0;
}

I use a mostly O(1) Small Block Memory Manager (SBMM). Basically it works this way:

1) It allocates larger SuperBlocks from the OS and tracks the Start+End Addresses as a range. The size of the SuperBlock is adjustable but 1MB makes a pretty good size.

2) The SuperBlocks are broken into Blocks (also adjustable in size... 4K-64K is good depending on your app). Each of these Blocks handles allocations of a specific size and stores all the items in the Block as a singly linked list. When you allocate a SuperBlock, you make a linked list of Free Blocks.

3) Allocating an Item means A) Checking to see if there is a Block with Free Items handling that size - and if not, allocating a new Block from the SuperBlocks. B) Removing the Item from the Block's Free List.

4) Freeing an Item by address means A) Finding SuperBlock containing address(*) B) Finding Block in SuperBlock (substract SuperBlock start address and divide by Block size) C) Pushing Item back on the Block's Free Item list.

As I stated, this SBMM is very fast as it runs with O(1) performance(*). In the version I have implemented, I use an AtomicSList (similar to SLIST in Windows) so that it is not only O(1) performance but also ThreadSafe and LockFree in the implementation. You could actually implement the algorithm using Win32 SLIST if you wanted to.

Interestingly, the algorithm for allocating Blocks from the SuperBlocks or Items from the Blocks result in nearly identicaly code (they're both O(1) allocations off a Free List).

(*) The SuperBlocks are arranged in a rangemap with O(1) average performance (but a potential O(Lg N) for worstcase where N is the number of SuperBlocks). The width of the rangemap depends on knowing roughly how much memory you're going to need in order to get the O(1) performance. If you overshoot, you'll waste a bit of memory but still get O(1) performance. If you undershoot, you'll approach O(Lg N) performance but the N is for the SuperBlock count -- not the item count. Since the SuperBlock count is very low compared to the Item count (by about 20 binary orders of magnitude in my code), it is not as critical as the rest of the allocator.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow