B+ Tree in JavaScript: notes

by Graham O'Neill

In my B+ Tree I have adopted the principle that the node key values are the break between "less than" and "greater than or equal to". So the key value will appear at the beginning of the leaf that is to the right of the node value. I have seen examples where they use the break as "less than or equal to" and "greater than", leaving the key value as the last item in the leaf. My version has the advantage that it's faster to access the most significant key value as you don't need to keep checking the length of the array.

To create a tree and make use of the following properties and methods you need to start with:

objName = new tree(order);

where order is a number that is 3 or more.

Properties

The leaves and nodes store the key values and pointers in simple arrays. This simplifies the insertion process as it makes it possible to add a key to the array before finding that the array is now too big for the leaf or node. In more restrictive, fixed structures you would need to identify which value would overflow and store that separately from the main leaf or node values.

I have kept the amount of information stored in the nodes and leaves to a minimum. So apart from the keys and pointers, the only other elements are the forward and backward pointers on the leaves. Since JavaScript is a classless structure I have used prototypes for all the methods, which again keeps memory use to a minimum. It also means I don't even need to store the "isLeaf" flag on each node - I can just have a method for each object to identify it.

For the tree object itself, I store a few properties that are used internally by the methods, and these are described as "private". There are also some properties labelled as "public". These are the properties that provide information back to the calling script and are intended to be read only. Of course JavaScript has no such thing as private/public/read only properties but I think you can see what I mean. By the way, "eof" stands for End of File.

The Wikipedia article about B+ Trees is a bit confusing about the minimum number of keys required in each leaf and node, as it concentrates on children rather than keys. For clarity, these routines deal with the number of keys, for which:

all_maximum_keys = (Order-1)
leaf_minimum_keys = floor(Order/2)
node_minimum_keys = floor(all_maximum_keys/2)

where "floor" means rounding down. For example with Order 7, the maximum number of keys allowed is 6 and for both leaf and nodes the minimum is 3. For Order 8, the maximum is 7 and each leaf must have at least 4 keys and each node at least 3.

Methods

Before describing the main tree methods, it's worth making a couple of observations about the structure of B+ trees as these are major influences on how the methods work:

The only key values that appear in the internal nodes are the first key values from each leaf, with the exception of the key from the very first leaf which isn't included.
Each key value that appears in the internal nodes only appears once.

The implications from these are that you can change anything you like on the leaves without worrying about the nodes, unless you change the very first key value. If you do need to change a node value then you only need to find and change it once.

If you look at the scripts you might notice that I code all of the array loops rather than using the JavaScript splice method. I found that coded loops ran significantly faster than splice. I also tried a binary chop routine for searching the leaf keys but didn't find it much faster than a simple loop. I've included the chop version in the source in case you want to try it yourself.

INSERT:result = objName.insert(key, data);

The Insert method allows you to add new keys to the tree, returning True and increasing the objName.length value by one if the addition was successful, and False if it failed (because the key already exists). The other properties are set as:

Insert	keyval	recnum	eof	found
true	key	data	false	false
false	key	found data	false	true

The first step is to locate the leaf where the key will be inserted. This is a simple search process but since it's possible the internal nodes might need updating it's useful to keep track of which nodes are used. That way we can retrace the path just by popping the nodes off the list rather than making the whole process recursive.

If the leaf found isn't already full then adding the key is trivial: just insert it into the correct place. The only leaf where it is possible to insert at the very first location is the first leaf, and that is the one leaf whose value doesn't appear in the nodes. So no node changes are required in these cases.

The more interesting case is where the leaf is already full. In a B* Tree an attempt would then be made to move a key to a sibling leaf, but for a B+ Tree the next step is to split the leaf into two. So the method adds the key to the leaf's array, leaving it with (Order) number of keys, which is too many. Since the minimum required is (Order/2) we move half of the keys to a new leaf, leaving both the original and the new one as valid. For example, adding key [6] here results in keys [6] and [7] being put into a new leaf:

With a new leaf created we now need to add its very first value, [6], and the pointer to its leaf to the internal nodes. Promoting the key value to the nodes is again trivial if the node above it isn't full:

When it is full, splitting the node is a slightly different process to before. Splitting a leaf involves copying the key value into the nodes, but splitting a node means moving the key. So adding key [6] here would result in two nodes with [3,6] and [12], and the value [8] being moved up to the next level:

Of course the parent receiving [8] might also be full and need splitting, and so the process continues. Normally just the key value [8] and the pointer to the new node [12] is moved up, but if the split node is the root then the pointer to the sibling node [3,6] is also required (as shown above). So the promotion process always carries both pointers up just in case.

REMOVE:result = objName.remove([key]);

This will perform a proper B+ Tree deletion, keeping the leaves and nodes valid and making sure the tree remains balanced (all leaves at the same depth in the tree). If the deletion was successful it returns True and moves the current pointer to the next key. False is returned if the deletion failed because the key was specified but didn't exist, or because no key was specified but the pointer was at End of File.

Remove	keyval	recnum	eof	found
true	next key	next data	(depends)	true
false	(empty)	-1	true	false

The first thing to do is to check that the key exists in the tree. If not it clearly can't be deleted. As with Insert, when performing the search it's useful to store a list of the nodes visited so that the nodes on the path can be updated later if necessary.

Having deleted the key from the leaf there are two cases where the process can be finished early: the tree is so small that the root and the leaf are the same, or the leaf still has the minimum number of keys left. However, remember that the very first key value appears on the internal nodes, so if that key is deleted then the internal node value should be changed. For example deleting [3] here changes the node value:

No pointers change, though, just the node's key value.

It gets more complicated if the leaf has too few keys after the deletion. The first thing to do is to check the siblings of the leaf (ie. adjacent leaves with the same parent node). If one of them has a spare key then it can be moved to the current leaf, remembering to adjust the nodes above if any leaf's first key value changes. In this example, having deleted [7] the leaf with [4] is invalid but we can take a key from either sibling, so these diagrams show the effect of each:

Again, none of the pointers change. If neither sibling is big enough then we need to merge two leaves. This is potentially the most difficult process as now a node value is going to have to be removed. The first step of merging the leaves is easy. Because we couldn't move a key from the sibling we already know it has only (Order/2) keys - the minimum. We know the current leaf has one less than the minimum. So any sibling that exists can be merged without worrying about an overflow. In the following example, deleting [7] means the leaf with [4] needs to be merged. My routine always merges towards the left for two reasons. The first is that it's easier to write just one routine for the merge. The other, more important one, is that it makes updating the nodes much easier. The key value for the left leaf might appear anywhere in the tree above (or maybe not at all if it is the very first leaf), but the key for its sibling to the right must be in the immediate parent. So when deleting the leaf on the right we only need to update the parent node for that leaf:

If the parent node still has enough key values to be valid then we have finished. Or, if the parent is now empty and was the root, then the leaf we just merged becomes the root. Otherwise we start a loop, setting the parent of the leaves as the current node and performing the following steps to try and make it valid.

First we get the current node's parent, which is easy as we kept track of the path at the beginning. Then we check if we can take a key from a sibling node, although with the nodes it is a slightly different process. Since my examples so far have been Order 4, an invalid node would be one with no keys at all so here are examples of taking a key from a sibling:

Note how the node pointer just moves across to either the beginning or end of the existing pointer(s), but the key values rotate through the appropriate key on the parent node. If a sibling key can't be taken then nodes need to be merged:

Again, I merge from the right to the left. This is easier than it looks. First the appropriate key from the parent is moved down to the left node, then all keys and pointers are moved from the right node. Finally the pointer to the right node is removed. Now we have deleted a key from the parent we check if that parent is valid or not. If it has too few keys then we make it the current node and repeat the loop. This will continue until the current node is valid, or until the parent is the root and is now empty, in which case the last merged node becomes the new root.

SEEK:result = objName.seek(key [,near]);

To find a key in the tree use the Seek method. If the key isn't found then it will return False, but quite often in this case you want to know what the next nearest key is. So a second parameter, Near (true or false) allows you to do this.

Seek	keyval	recnum	eof	found
true	(near) key	(near) data	false	true
false	(empty)	-1	true	false

SKIP:result = objName.skip(count);

Having positioned the current pointer to a key using one of the other methods, you can use this command to jump a specified number of keys. If Count is negative then the pointer is moved backwards in the tree.

Skip	keyval	recnum	eof	found
true	key	data	false	true
false	(empty)	-1	true	false

GOTO:result = objName.goto(count);

To position the pointer at a specific key location in the tree. For example Goto(3) will set the third key in the tree as the current key. If Count is negative then the current key is found relative to the end of the tree.

Goto	keyval	recnum	eof	found
true	key	data	false	true
false	(empty)	-1	true	false

KEYNUM:count = objName.keynum();

This returns the sequence number of the current key or -1 if it's currently at end of file. It doesn't affect any of the indicators. It is useful when combined with the Goto() method.

GOTOP:result = objName.goTop();

The very fist key in the tree is set as the current key. False will only be returned if the tree is completely empty.

GoTop	keyval	recnum	eof	found
true	key	data	false	true
false	(empty)	-1	true	false

GOBOTTOM:result = objName.goBottom();

The very last key in the tree is set as the current key. False will only be returned if the tree is completely empty.

GoBottom	keyval	recnum	eof	found
true	key	data	false	true
false	(empty)	-1	true	false

PACK:result = objName.pack();

Inserting and deleting keys into a tree will result in some leaves and nodes having the minimum number of keys allowed. This routine will ensure that each leaf and node has as many keys as possible, resulting in a denser, flatter tree. False is only returned if the tree is completely empty.

Pack	keyval	recnum	eof	found
true	key	data	false	true
false	(empty)	-1	true	false