As an application programmer using a unix-flavored operating system, one’s view of memory management is usually limited to
malloc and its various alternative forms. However,
malloc is part of the C standard library and while it is specified as part of POSIX.1, the specification defers to the ISO C standard. In fact, since
malloc is part of the C standard library it is not a part of the kernel and can’t allocate memory directly. For that, we need to use kernel functions.
POSIX.1-1998 specifies both
mmap for allocating memory. But
sbrk was allegedly removed in POSIX.1-2001, and certainly we can see it no longer exists in the version of POSIX.1-2004 still available online. To understand what they do and why
sbrk was probably removed, we need to understand how a program’s code and data are laid out in memory.
Modern operating systems implement virtual memory, which gives each application the illusion that each has access to all (or most) of the machine’s physical memory. In reality, the kernel maps virtual memory locations to physical ones and, with help from the CPU, translates virtual to physical addresses on the fly. Since we ostensibly have the entire address space in which to arrange our code and data, this gives us a lot of flexibility.
The diagram on the right shows a pretty typical way to arrange the code and data for a program. Starting at the bottom (which has the lowest memory address), we have the text region. This is where a prorgam’s code would be stored. Above that is the BSS (Block Started by Symbol) region and above that is the initialized data region, which are where static variables (uninitialized and initialized, respectively) are stored. Above that, we have the heap, free memory, and the stack.
Any standard C course will explain the difference between the stack and the heap. The stack contains variables like function arguments and variables defined locally within a function. These variables are automatically pushed onto the stack when a function is called, and are popped off the stack when the function returns a value. In order to keep a variable around for longer than the life of a function call, we need to allocate on the heap.
Keep in mind that this memory layout is only the typical case. An OS that implements the POSIX.1 specification may not lay out memory in the same way.
sbrk kernel functions move the boundary between the heap and free memory, effectively increasing the size of the heap and allocating more memory for a program. In particular,
brk takes a memory address and sets that as the new “break” (the byte past the last heap-allocated memory), whereas
sbrk more usefully takes a number of bytes and increases the size of the heap by that many bytes. We can effectively decrease the size of the heap by giving
sbrk a negative value.
We could get away with only using this function if we treated the heap as another stack. In other words, if we always deallocated in the reverse order as we allocated memory, then we could easily just use
However, it’s much more useful as an application programmer to be able to allocate and deallocate from the heap in any order. For this reason, when
malloc is implemented using
sbrk, it usually allocates more space than it needs and uses something like a linked-list to keep track of unused memory locations.
A naive implementation of
malloc might assume that no programmer would both use
malloc and call
sbrk directly, so the specification warns that the behavior of
sbrk is unspecified if an application also uses any other functions (such as
sbrk makes this strong assumption that the heap is a contiguous area of memory with a single boundary which can grow or shrink to allocate more or less memory to a process. This, and that
sbrk doesn’t play well with
malloc, are probably why it was removed from the POSIX.1 standard. That said, most unix-flavored OSes typically still implement it. Thankfully, we have a nice alternative:
ASIDE: For more on implementing
sbrk, there’s a great code review cleaning up the
malloc implementation from K&R which uses a function
morecore which seems to be analagous to
sbrk. There’s also a high-level view of how malloc works, as well as a tutorial on implementing a simple version of malloc based on a tutorial on implmeneting a complex version of malloc.
mmap is specified in the latest version of the POSIX.1 specification. Its brief description is “The mmap() function shall establish a mapping between an address space of a process and a memory object.” Immediately, we can see that this doesn’t make any assumptions about how a program is laid out in memory. Although at first glance, it may seem that this function is more about mapping something like I/O into the address space of a process.
The function prototype for
void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off);
The specification also goes on to state that if
addr is 0, and
FLAG_FIXED is not set, then the implementation is free to find unused memory into which to map. Also, most operating systems have an additional flag
MAP_ANON which effectively just allocates the desired memory in the same way that
malloc would, without making any assumptions about memory layout. To illustrate this, we can implement a very rudimentary version of
malloc using only
munmap to implement
You can run this code online at codepad.