M way search trees
As defined in Section , an internal node of an M-way search tree contains n subtrees and n-1 keys, where , for some fixed value of . The preceding sections give implementations for the special case in which the fixed value of M=2 is assumed (binary search trees). In this section, we consider the implementation of M-way search trees for arbitrary, larger values of .
Why are we interested in larger values of M? Suppose we have a very large data set--so large that we cannot get it all into the main memory of the computer at the same time. In this situation we implement the search tree in secondary storage, i.e., on disk. The unique characteristics of disk-based storage vis-à-vis memory-based storage make it necessary to use larger values of M in order to implement search trees efficiently.
The typical disk access time is 1-10 ms, whereas the typical main memory access time is 10-100 ns. Thus, main memory accesses are between 10000 and 1000000 times faster than typical disk accesses. Therefore to maximize performance, it is imperative that the total number of disk accesses be minimized.
In addition, disks are block-oriented devices. Data are transfered between main memory and disk in large blocks. The typical block sizes are between 512 bytes and 4096 bytes. Consequently, it makes sense to organize the data structure to take advantage of the ability to transfer entire blocks of data efficiently.
By choosing a suitably large value for M, we can arrange that one node of an M-way search tree occupies an entire disk block. If every internal node in the M-way search tree has exactly M children, we can use Theorem to determine the height of the tree:
where n is the number of internal nodes in the search tree. A node in an M-way search tree that has M children contains exactly M-1 keys. Therefore, altogether there are K=(M-1)n keys and Equation becomes . Ideally the search tree is well balanced and the inequality becomes an equality.
For example, consider a search tree which contains keys. Suppose the size of a disk block is such that we can fit a node of size M=128 in it. Since each node contains at most 127 keys, at least 16513 nodes are required. In the best case, the height of the M-way search tree is only two and at most three disk accesses are required to retrieve any key! This is a significant improvement over a binary tree, the height of which is at least 20.
The algorithm for searching for a value in an M-way search tree is the obvious generalization of the algorithm for searching in a binary search tree. If we are searching for value X are and currently at node consisting of values V1...Vk, there are four possible cases that can arise:
- If X < V1, recursively search for X in V1's left subtree.
- If X > Vk, recursively search for X in Vk's right subtree.
- If X=Vi, for some i, then we are done (X has been found).
- the only remaining possibility is that, for some i, Vi < X < V(i+1). In this case recursively search for X in the subtree that is in between Vi and V(i+1).
For example, suppose we were searching for 68 in the tree above. At the root, case (2) would apply, so we would continue the search in V2's right subtree. At the root of this subtree, case (4) applies, 68 is between V1=55 and V2=70, so we would continue to search in the subtree between them. Now case (3) applies, 68=V2, so we are done. If we had been searching for 69, exactly the same processing would have occurred down to the last node. At that point, case (2) would apply, but the subtree we want to search in is empty. Therefore we conclude that 69 is not in the tree.
Other the algorithms for binary search trees - insertion and deletion - generalize in a similar way. As with binary search trees, inserting values in ascending order will result in a degenerate M-way search tree; i.e. a tree whose height is O(N) instead of O(logN). This is a problem because all the important operations are O(height), and it is our aim to make them O(logN). One solution to this problem is to force the tree to be height-balanced; we sketched this last lecture for binary search trees and now we will examine it in detail for M-way search trees.
A multiway tree is a tree that can have more than two children. A multiway tree of order m (or an m-way tree) is one in which a tree can have m children.
As with the other trees that have been studied, the nodes in an m-way tree will be made up of key fields, in this case m-1 key fields, and pointers to children.
multiway tree of order 5
To make the processing of m-way trees easier some type of order will be imposed on the keys within each node, resulting in a multiway search tree of order m ( or an m-way search tree). By definition an m-way search tree is a m-way tree in which:
- Each node has m children and m-1 key fields
- The keys in each node are in ascending order.
- The keys in the first i children are smaller than the ith key
- The keys in the last m-i children are larger than the ith key
4-way search tree
M-way search trees give the same advantages to m-way trees that binary search trees gave to binary trees - they provide fast information retrieval and update. However, they also have the same problems that binary search trees had - they can become unbalanced, which means that the construction of the tree becomes of vital importance.