|
| 1 | +## Custom memory allocator |
| 2 | + |
| 3 | +LibSass comes with a custom memory allocator to improve performance. |
| 4 | +First included in LibSass 3.6 and currently disabled by default. |
| 5 | +Needs to be enabled by defining `SASS_CUSTOM_ALLOCATOR`. |
| 6 | + |
| 7 | +### Overview |
| 8 | + |
| 9 | +The allocator is a pool/arena allocator with a free-list on top. The |
| 10 | +memory usage pattern of LibSass fits this implementation very well. |
| 11 | +Every compilation tends to accumulate memory and only releasing some |
| 12 | +items from time to time, but the overall memory consumption will mostly |
| 13 | +grow until the compilation is finished. This helps us to keep the |
| 14 | +implementation as simple as possible, since we don't need to release |
| 15 | +much memory back to the system and can re-use it instead. |
| 16 | + |
| 17 | +### Arenas |
| 18 | + |
| 19 | +Each arena is allocated in a (compile time) fixed size. Every allocation |
| 20 | +request is served from the current arena. We basically slice up the |
| 21 | +arena into different sized chunks. Arenas are never returned to the |
| 22 | +system until the whole compilation is finished. |
| 23 | + |
| 24 | +### Slices |
| 25 | + |
| 26 | +A memory slice is a part of an arena. Once the system requests a sized |
| 27 | +memory chunk we check the current arena if there is enough space to |
| 28 | +hold it. If not a new arena is allocated. Then we return a pointer |
| 29 | +into that arena and mark the space as being used. Each slice also |
| 30 | +has a header which is invisible to the requester as it lies before |
| 31 | +the pointer address that we returned. This is used for book-keeping. |
| 32 | + |
| 33 | +### Free-lists (or buckets) |
| 34 | + |
| 35 | +Once a memory slice is returned to the allocator it will not be released. |
| 36 | +It will instead be put on the free list. We keep a fixed number of free lists, |
| 37 | +one for every possible chunk size. Since chunk sizes are memory aligned, we |
| 38 | +can get the free-list index (aka `bucket`) very quickly (`size/alignment`). |
| 39 | +For further readings see https://en.wikipedia.org/wiki/Free_list. |
| 40 | + |
| 41 | +### Chunk-sizes |
| 42 | + |
| 43 | +Since arenas are of fixed size we need to make sure that only small |
| 44 | +enough chunks get served from it. This also helps to keep implementation |
| 45 | +simple, as we can statically declare some structures for book-keeping. |
| 46 | +Allocations that are too big to be tracked on a free-list will be patched |
| 47 | +directly to malloc and free. This is the case when the bucket index would |
| 48 | +be bigger than `SassAllocatorBuckets`. |
| 49 | + |
| 50 | +### Thread-safety |
| 51 | + |
| 52 | +This implementation is not thread-safe by design. Making it thread-safe |
| 53 | +would certainly be possible, but it would come at a (performance) price. |
| 54 | +Also it is not needed given the memory usage pattern of LibSass. Instead |
| 55 | +we should make sure that memory pools are local to each thread. |
| 56 | + |
| 57 | +### Implementation obstacles |
| 58 | + |
| 59 | +Since memory allocation is a core part of C++ itself, we get into various |
| 60 | +difficult territories. This has specially proven true in regard of static |
| 61 | +variable initialization and destruction order. E.g when we have a static |
| 62 | +string with custom allocator. It might be that it is initialized before |
| 63 | +the thread local memory pool. On the other hand it's also possible that |
| 64 | +the memory pool is destroyed before another static string wants to give |
| 65 | +back its memory to the pool. I tried hard to work around those issues. |
| 66 | +Mainly by only using thead local POD (plain old data) objects. |
| 67 | + |
| 68 | +See https://isocpp.org/wiki/faq/ctors#static-init-order |
| 69 | + |
| 70 | +### Performance gains |
| 71 | + |
| 72 | +My tests indicate that the custom allocator brings around 15% performance |
| 73 | +enhancement for complex cases (I used the bolt-bench to measure it). Once |
| 74 | +more get optimized, the custom allocator can bring up to 30% improvement. |
| 75 | +This comes at a small cost of a few percent of overall memory usage. This |
| 76 | +can be tweaked, but the sweet spot for now seems to be: |
| 77 | + |
| 78 | +```c |
| 79 | +// How many buckets should we have for the free-list |
| 80 | +// Determines when allocations go directly to malloc/free |
| 81 | +// For maximum size of managed items multiply by alignment |
| 82 | +#define SassAllocatorBuckets 512 |
| 83 | + |
| 84 | +// The size of the memory pool arenas in bytes. |
| 85 | +#define SassAllocatorArenaSize (1024 * 256) |
| 86 | +``` |
| 87 | +
|
| 88 | +These can be found in `settings.hpp`. |
| 89 | +
|
| 90 | +### Memory overhead |
| 91 | +
|
| 92 | +Both settings `SassAllocatorBuckets` and `SassAllocatorArenaSize` need |
| 93 | +to be set in relation to each other. Assuming the memory alignment on |
| 94 | +the platform is 8 bytes, the maximum chunk size that can be handled |
| 95 | +is 4KB (512*8B). If the arena size is too close to this value, you |
| 96 | +may leave a lot of RAM unused. Once an arena can't fullfil the current |
| 97 | +request, it is put away and a new one is allocated. We don't keep track |
| 98 | +of unused space in previous arenas, as it bloats the code and costs |
| 99 | +precious cpu time. By setting the values carefully we can avoid the cost |
| 100 | +and still provide reasonable memory overhead. In the worst scenario we |
| 101 | +loose around 1.5% for the default settings (4K of 256K). |
| 102 | +
|
| 103 | +### Further improvements |
| 104 | +
|
| 105 | +It is worth to check if we can re-use the space of old arenas somehow without |
| 106 | +scarifying to much performance. Additionally we could check free-lists of |
| 107 | +bigger chunks sizes to satisfy an allocation request. But both would need |
| 108 | +to be checked for performance impact and their actual gain. |
0 commit comments