If you read our previous posts of Memory Management Bugs: An Introduction and Post-Mortem Memory Debugging, you have an idea how Backtrace may help debugging various memory errors. In this post, I will illustrate how it works under the hood with the example of TCMalloc, one of the memory allocators supported by Backtrace.
The primary goal of TCMalloc is high performance of memory allocations, especially for memory intensive applications in multi-threaded environment. The motivation is manifested in its implementation of thread cache, size class and lock reduction.
The following figure shows the architecture of TCMalloc. The thread cache on the right side of the figure serves the memory allocation of small sizes, which constitutes the majority of memory requests for almost all applications. The thread cache is a TLS (thread local storage) object which requires no lock at all. On the left side is the central heap component which is the layer interfacing with OS kernel. It requires global locking since it is shared by all threads. Memory is managed in unit of logical pages at this level. A group of contiguous pages is called span. Central heap maintains a map of all spans and is responsible to supply a free span when requested. Large size memory requests are fulfilled by the page allocator of this component. Central free list serves as the middleman between central heap and thread cache which carves spans into small pieces to replete thread cache in anticipation of user requests. It also receives freed memory from thread cache when there are too many and returns it to central heap if the whole span is free.
Examples of Memory Errors
Though all documents state that the consequence of memory errors such as double free, invalid free or memory overrun is undefined, it is interesting and sometimes useful to know the symptom specific to a memory allocator because it may explain the observed behavior and provide plausible theory. The following examples are particular to TCMalloc’s implementation.
The first example shows a double-free error. Memory region pointed by
p is freed twice without observable failure, which results the following two allocations of the same size return the same address to two variables. As a matter of fact, it will return the same address to infinite number of requests of the same size. The reason, TCMalloc uses a singly-linked list to cache freed memory regions with link pointer inside the free region itself. Allocation is simply pop off the head of the list and move the head to the next node on the list. In case of this simplified scenario, the double-freed region has the link pointer points to itself and causes the head pointer stuck to this region. In reality, an application is likely to write data into the allocated memory and overwrite the link pointer which breaks the circular link. However, it is unavoidable that the region is allocated at least twice to two different requests. From application’s point view, it appears that allocator erroneously allocates the same memory region to two different variables which causes unpredictable consequences. The situation may be even more perplexing if the same memory region is freed twice in different threads. It will be linked in the free lists of these two thread’s cache and crosses each other due to sharing the same node unwittingly. As shown in the figure, region A is first freed and linked in thread 1’s cache. However, it is freed again in thread 2 which is put on the thread’s free list. This causes thread 1’s cache shares with thread 2’s while regions B and C are leaked.
void *p, *p1, *p2;
p = malloc(8);
p1 = malloc(8);
p2 = malloc(8);
$1 = (void *) 0xf16008
$2 = (void *) 0xf16008
$3 = (void *) 0xf16008
0xf1600a, is two bytes after the correct address
0xf16008. TCMalloc doesn’t crash or fail in the free function. Instead it pushes the invalid free pointer to the thread cache’s free list and pops it off in the next memory request. Therefore,
p2gets the invalid address which is improperly aligned and is doomed to overrun its boundary by two bytes.
void *p1, *p2;
p1 = malloc(8);
free((char *)p1 + 2);
p2 = malloc(8);
$1 = (void *) 0xf16008
$2 = (void *) 0xf1600a
The last example is an off-by-one error which overruns the allocated memory region by one byte. The program crashes in the next malloc function call. The backtrace suggests heap data corruption. We don’t need to care about the detail of the heap data involved. In short, the memory region after what is allocated to variable
pis free and on span’s free object list. The one byte overwrites the link pointer embedded in that region and causes the crash soon after.
const size_t sz = 8;
char *p, *p2;
p = (char *)malloc(sz);
p[sz] = '\0';
p2 = (char *)malloc(sz);
$bt --minidump --core core.7958 tcmalloc_test
[ 0] libtcmalloc.so.4.3.0 tcmalloc::SLL_SetNext(void*, void*) (src/linked_list.h:49)
[ 1] libtcmalloc.so.4.3.0 tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) (src/central_freelist.cc:314)
[ 2] libtcmalloc.so.4.3.0 tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) (src/central_freelist.cc:282)
[ 3] libtcmalloc.so.4.3.0 tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) (src/central_freelist.cc:264)
[ 4] libtcmalloc.so.4.3.0 tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) (src/thread_cache.cc:126)
[ 5] libtcmalloc.so.4.3.0 tcmalloc::ThreadCache::Allocate(unsigned long, unsigned long) (src/thread_cache.h:368)
[ 6] libtcmalloc.so.4.3.0 (anonymous namespace)::do_malloc_small(tcmalloc::ThreadCache*, unsigned long) (src/tcmalloc.cc:1193)
[ 7] libtcmalloc.so.4.3.0 (anonymous namespace)::do_malloc(unsigned long) (src/tcmalloc.cc:1200)
[ 8] libtcmalloc.so.4.3.0 (anonymous namespace)::do_malloc_or_cpp_alloc(unsigned long) (src/tcmalloc.cc:1219)
[ 9] libtcmalloc.so.4.3.0 tc_malloc (src/tcmalloc.cc:1603)
[ 10] tcmalloc_test buffer_overrun() (main.cpp:112)
[ 11] tcmalloc_test main (main.cpp:122)
[ 12] libc-2.23.so __libc_start_main
The behaviors shown in the above examples are related to the singly-linked list used in TCMalloc for various free lists. Other allocators may manifest the same error in different symptoms.
Backtrace’s Post-Mortem Analysis of TCMalloc Heap
Backtrace’s heap module supports several popular allocators like JEMalloc, PTMalloc and UMA. TCMalloc is the latest addition to the group. It starts by looking for the global variables of the allocator such as
tcmalloc::Static::central_cache_, etc. Then it parses the heap data according to the type information of debug symbols. While it scans the whole heap metadata, inconsistency due to memory errors is found and reported. For example, a double-free incident will leave the same heap region twice in one or more thread caches or central free list if they are not yet reused. Backtrace parses these free lists and reports a double-free error if duplicates are found. The following picture shows an example in Backtrace’s snapshot viewer hydra. Notice the message in the middle pane “Critical: free was erroneously called on the same address”. The bottom pane lists the heap statistics and summary of important heap metadata.
Normally, it is not required for a developer to understand a memory allocator inside out. In most cases, it is also impractical to manually check heap data. However, this is crucial information needed in order to interpret the observed signals and relate them to application errors. Backtrace fills this gap by reducing the complexity of a memory allocator’s implementation, finding critical inconsistency in heap data, and reporting the errors from the user’s perspective. When combined with Backtrace’s classifiers, this becomes even more useful by identifying patterns that point to the source of the bug.