Skip to content

Immix Garbage Collection

Ding YuChen edited this page May 2, 2020 · 7 revisions

Immix Garbage Collector is a type of mark-region garbage collection algorithm, that splits the heap space into blocks and lines, which make the finding of holes, and the freeing of unused objects faster than traditional mark-sweep or mark-copy algorithms

Node Layout

General Node Layout

const TAG_SLOT = 0;
const SIZE_SLOT = 1;
const FIRST_CHILD_SLOT = 2;
const LAST_CHILD_SLOT = 3;
const MARK_SLOT = 4;

Nodes in the Immix Garbage Collector differ from regular SVML nodes by including a MARK slot for all objects that live in the heap. This slot is required in order for us to resolve circular or multiple references when traversing live nodes.

Block and Line Layouts

// block nodes layout
//
// 0: tag  = BLOCK_TAG
// 1: size = total size of block
// 2: offset of first child from the tag: 6
// 3: offset of last child from the tag: (NUM_OF_LINES_PER_BLOCK - 1) * LINE_BK_SIZE (3)
// 4: mark (free: 0, occupied: 1)
// 5: block state (occupied: 0, recyclable: 1, free: 2)
// 6: line 0 start address
// 7: line 0 mark bit (marked: 1, unmarked: 0)
// 8: line 0 end of occupied address (for block occupancy profiling purpose)
// 9: line 1 address
// ...
// NUM_OF_LINES_PER_BLOCK * LINE_BK_SIZE + BLOCK_BK_SIZE: start of line 0 usable memory

Blocks contain bookkeeping slots for lines, in order to reserve a contiguous memory space in the body of the block to allow for allocation of objects that are more than the size of a line.

Line address refers to start of the line metadata with reference to the heap.

Design Considerations

  1. Line contains start and end addresses in order to provide more information and allow for flexibility of extension in the future, for example block occupancy profiling.
  2. We keep the start address even though it can be calculated from a given line address in order to reduce the use of registers to calculate this address, and also to provide convenience to the programmer.

Algorithm

The Immix Garbage Collector is essentially a Mark-Region Garbage collector with 2 levels of granularity as well as a compaction mechanism. There are also several other bells and whistles like overflow allocation and opportunistic defragmentation.

Garbage Collection

The garbage collection mechanism goes something like this:

function garbage_collect() {
    map(roots(heap), mark);
    map(blocks(heap), free);
    unmark_all_nodes(heap);
}

function mark(node) {
    node.mark_slot = MARKED;
    for (child in node.children) {
        if (is_forwarding_address(child)) {
            child = heap[forwarding_address];
        }
        if (child.mark_slot === UNMARKED) {
            const b = get_block(child);
            if (need_to_defrag) {
                const forwarding_address = copy(child);
                child = heap[forwarding_address];
            }
            get_block(child).mark_slot = MARKED;
            get_line(child).mark_slot = MARKED;
            mark(child);
        }
    }
}

function free(block) {
    if (block.mark_slot === MARKED) {
        for (line in block.lines) {
            if (line.mark_slot === MARKED) {
                free_line(line);
            } 
        }
    } else {
        free_block(block);
    }
}

Marking

All live objects that are reachable will be traversed via dfs and marked. If the defragmentation is active, the object will be copied to a free block and the garbage collector will leave behind a forwarding address. If this forwarding address is encountered again during the marking phase, the parent node of the object will have it's reference updated accordingly. At the end of the marking phase, we can be sure that there are no more loose pointers and the candidate block can then be freed and turned into the new headroom block.

Debugging

Debugging is a tricky thing to do considering the possible propagation of unwanted behavior before the program fails and the absence of a stack trace. In order to improve the programmer productivity, this writer has decided to include helpful and general assertions that can be used throughout the program in order to ensure correctness and faster debugging.

Assertions

Assertions in Source makes use of the built-in error function. Assertion functions also take in a list of string that acts as a rudimentary stack trace that allows the user to identify where a violation of the assertion takes place. Upon encountering an assertion, the SVM will print out current register values as well as a visualization of the heap.

Note that the assertions listed here are only a part of *-with-mark-region-gc.js programs. (at the point of writing, May 2020)

Clone this wiki locally