bothner@cygnus.com
The basic idea is that the run-time system must maintain a "memory map". This is a table of "chunk descriptors", where each chunk is a range of memory addresses. Because of the need to monitor out-of-bounds errors in non-malloc'd memory, the chunk descriptors have to be separate from the data structures maintained by malloc.
The other basic idea is that we modify the compiler so each pointer operation is validated against the memory map. To do that we have to find the chunk descriptor for the memory containing the pointer.
The most efficient way to map from a pointer to chunk descriptors is to always carry the address of the chunk descriptor with the pointer value. That is to use a "fat pointer", which is a pair of the actual pointer, and the descriptor for the chunk it points into. The problem is that using a fat pointer instead of a regular pointer violates binary compatibility, which means all the libraries have to be recompiled. This is a major hassle, and impractical for many projects. There are also problems with sizeof and unions if pointer suddenly becomes twice as big.
Therefore, pointer arguments and return values, and pointers in data structures (arrays, structs and unions), have to be normal "thin" pointer. However, we can use fat pointers for pointers stored in local variables, and that can be a major performance boost.
/* A chunk descriptor. */ typedef struct chunk { void *start; /* Start address of chunk. */ void *end; /* End address of chunk. */ .... /* More later */ } chunk_t; /* Create a new chunk descriptor for a chunk of memory. */ chunk_t * add_chunk (void *start, void *end) { chunk_t *chunk = allocate_chunk (); chunk->start = start; chunk->end = end; link chunk into global memory map; return chunk; } void remove_chunk (chunk_t *chunk) { unlink chunk from global memory map; de_allocate_chunk (chunk); } /* Dummy chunk representing memory not managed by any chunk descriptor. */ struct chunk missing_chunk = { (void*) 0, (void *) 0xFFFFFFFF, ... }; chunk_t * find_chunk (void *ptr) { find a chunk containing ptr, if there is none, return &missing_chunk. } /* Dummy chunk used to indicate we need to call find_chunk. */ struct chunk unknown_chunk = { (void*) 1, (void *) 0, ... }; #define UNKNOWN_CHUNK &unknown_chunk chunk_t * find_chunk_if_needed (void *ptr, chunk_t* chunk) { return chunk == UNKNOWN_CHUNK ? find_chunk (ptr) : chunk; }
Because we need to be able to link with libraries and .o files that are not cmpiled with memory-checking enabled, pointers must be regular pointer - we cannot use "fat" pointers (aka descriptors).
We want to catch reading uninitialied memroy. We do this by optionally asociating a bitmap with each chunk:
typedef char* initialized_map_t; typedef struct chunk { .... initialized_map_t initialized_map; } chunk_t; #define IS_INITIALIZED(MAP, INDEX) (MAP[INDEX/8] & (1 << (INDEX % 8))) #define SET_INITIALIZED(MAP, INDEX) ((MAP[INDEX/8]) |= (1 << (INDEX % 8))) /* Check that SIZE bytes starting at PTR are all initialized. Assume the whole region lies within CHUNK. */ void check_initialized (void *ptr, int size, chunk_t* chunk) { int index = (char*) ptr - (char*) chunk->start; /* Of course this loop should be optimized. */ while (--index >= 0) if (! IS_INITIALIZED (chunk->initialized_map, index) SIGNAL_ERROR(): } /* Note that SIZE bytes starting at PTR are all initialized. Assume the whole region lies within CHUNK. */ void set_initialized (void *ptr, int size, chunk_t* chunk) { int index = (char*) ptr - (char*) chunk->start; /* Of course this loop should be optimized. */ while (--index >= 0) SET_INITIALIZED (chunk->initialized_map, index); }
We can only catch reads of uninitialized memory if all writes to
memory we are monitoring has calls to set_initialized
;
otherwise we will get a lot of false errors. We make the rule
that if a chunk's initialized_map
field is NULL
,
then we should not check for initialization on reads from the chunk.
We also provide two global variables that the programmmer can use
to control this test:
/* Control whether an initialized_map is created for new chunks. A map is only created by add_chunk if alloc_with_initialization_map is true. */ int alloc_with_initialization_map = 0; /* set_initialized is only called when do_check_initialization is true. */ int do_check_initialization = 0;
The standard function malloc
is replaced by:
void * malloc (size_t size) { void *ptr = __primitive_malloc (size); chunk_t *chunk = add_chunk (ptr, (char*) ptr + size); /* Some extra one-bit flag in a chunk. */ chunk->allocated_by_malloc = 1; return ptr; }
It is useful if malloc
would also also save the call stack
with the chunk; this is desirable for possible future error messages.
The free
function is obvious:
void free (void *ptr) { chunk_t *chunk = find_chunk (ptr); if (chunk == &missing_chunk || chunk->start != ptr || ! chunk->allocated_by_malloc) SIGNAL_ERROR(); __primitive_free(ptr); remove_chunk (chunk); }
It would be useful to provide alternative entry-points to these functions, but which returned/took a pair of a data pointer and a chunk pointer.
We can optimize calloc
since it does not need an
initialized_map
.
We can catch accesses to deleted memory by not actually deleting the memory or the chunk, but instead setting a "deleted" flag in the chunk. We would want to give the programmer so control over which free'd chunks are kept around, and for how long.
The compiler can easily add code that is run at program start-up
that calls add-chunk
for each global variable.
With some linker support, it does not even have to generate any code.
Instead, it just has to mark the limits of each variable in the
object file, and then have the linker set up the initial set of chunks.
A global variable whose address is never taken (and which does not contain an component whose address is taken) does not need a chunk.
We do not need to create initialized_map
's for global variables,
these are by C/C++ language definition initialized (to zero).
Guarding stack variables is in principle straight-forward.
We just do add_chunk
when the variable's scope
is enteered, and remove_chunk
when the scope
is exited. The tricky part is the later, since we have to
do remove_chunk
also when unwindind the stack
due to exceptions or longjmp
. If that is impractical,
we can do remove_chunk
lazily: Remove any chunks
beyond the stack pointer before we do a stack-related
add_chunk
or signal an error.
We only need chunks for variables whose address is taken, or that we want to check for initialization.
Each local pointer variable P
is augmented
by a second variable P$chunk
, which points to a chunk
pointed to be P
. We maintain this invariant:
P$chunk == UNKNOWN_CHUNK || P$chunk == find_chunk(P)
If P
is a local pointer variable, then `&P
'
is translated into `(P$chunk = &UNKNOWN_CHUNK, &P)
'.
Taking the address of some other pointer-valued lvalue is translated as is.
If P1
and P2
are local pointer variables,
then `P1 = P2' is translated into
`P1$chunk = P2$chunk, P1=P2'.
In general, if P
is a local pointer variable,
and X
is a pointer-valued expression, then `P = X
'
is translated into `P$chunk = UNKNOWN_CHUNK, P = X
'.
However, if we know (or can cheaply compute) the chunk containing
the value of `X
', the we can set P$chunk
to that known chunk instead.
Assigning to other pointer-valued lvalues is translated as is.
If P
is a local pointer variable, then the
operation `P + NUM
' is translated into:
P$chunk = find_chunk_if_needed (P, P$chunk), check_pointer_increment (P, P$chunk, NUM * sizeof(*P))m P + NUM;
where check_pointer_increment
is below.
If X
is a pointer-valued expression (with no side-effects),
then the operation `X + NUM
' is translated into:
(check_pointer_increment (X, find_chunk(X), NUM * sizeof(*X)), X + NUM)
The case where X
has side-effects, as well as the
pre- and post- decrement and increment unary operations,
are handled in the obvious manner.
If P
is a local pointer variable, then the
operation `*P
' (as an rvalue) is translated into:
(P$chunk = find_chunk_if_needed (P, P$chunk), check_pointer_read (P, sizeof(*P), P$chunk), *P)
where check_pointer_read
is below.
If X
is a pointer-valued lvalue (with no side-effects),
then the operation `*X
' (as an rvalue) is translated into:
(check_pointer_read (X, sizeof(*X), find_chunk(X)), *X)
When X
is a pointer-valued expression, but not
a local variable, the expression `X[N]
' can be
translated into:
(check_pointer_read (X+N, sizeof(*X), find_chunk(X)), X[N])
If `*P
' or `*X
' is used as an LHS value,
the translation uses check_pointer_write
instead of check_pointer_read
.
/* Signal an error if adjusting PTR by DELTA will bring it outside CHUNK. */ void check_pointer_increment (void *ptr, chunk_t *chunk, int delta) { if (delta == 0) return; if (ptr == NULL) SIGNAL_ERROR(); if (chunk != &missing_chunk) { if (ptr < chunk->start || ptr > chunk->end) /* Usually redundant */ SIGNAL_ERROR(); if ((char*) ptr + delta < chunk->start || (char*) ptr + delta > chunk->end) SIGNAL_ERROR(); } } void check_pointer_write (void *ptr, int size, chunk_t *chunk) { if (ptr == NULL) SIGNAL_ERROR(); if (chunk != &missing_chunk) { if (ptr < chunk->start || ptr > chunk->end || (char*) ptr + size < chunk->start || (char*) ptr + size > chunk->end) SIGNAL_ERROR(); if (chunk->initialized_map != NULL) set_initialized (ptr, size, chunk); } } void check_pointer_read (void *ptr, int size, chunk_t *chunk) { if (ptr == NULL) SIGNAL_ERROR(); if (chunk != &missing_chunk) { if (ptr < chunk->start || ptr > chunk->end || (char*) ptr + size < chunk->start || (char*) ptr + size > chunk->end) SIGNAL_ERROR(); if (chunk->initialized_map != NULL && do_check_initialization) check_initialized (ptr, size, chunk); } }