Formulae for B-tree and B⁺-tree in DB
The B-tree and B⁺-tree are commonly used in filesystems and databases. These data structures were created under the assumption that the indexes would be so big that only small chunks (a few nodes of the tree) could fit in the main memory.
Both data structures are a self-balancing search tree. The insertion and delete behaviors are quite complex since the trees are self-balancing and some extra conditions need to hold (for example, the nodes have to be at least half full). These extra conditions keep low the height of the tree.
The main difference is that the B-tree stores the data pointers and their keys in every node, while the B⁺-tree store the data pointers only in the leaves. Due to this choice in the B⁺-tree, it doesn’t matter what values are inside the internal nodes as long as they are correct; with this choice, we separated the nodes used as indexes from those used as pointers to entries, causing a simplification of the insert and delete operations.
In most implementations, every leaves in the B⁺-tree has a link to the next leaf to speed up sequential access for range queries, while the B-tree has not.
Note: the following formulae are generic; there are variations depending on the implementation.
The formula for B-tree
Let D be the block disk size, K is the fixed length of the key field, Pn is the size of the pointer to a node, Pd is the size of the pointer to the data and p is the order of the B tree. The nodes have p-1 keys and pointers to the disk and p pointers to the other nodes.
The idea is to find the maximum integer p such that a node fits the block disk size entirely so that the node is readable in one disk operation.
The formula for B⁺-tree
In this case, there are two p’s for the order, one for the internal nodes and one for the leaves.
It’s important to recall that the internal nodes are only used as a roadmap for the search, so they can contain values that don’t exist in the leaves.
Example for B-tree
Assuming the following values:
We want to find the maximum integer p, satisfying this condition:
Once you have p, you can calculate the number of data pointers and the number of entries for a completely filled B-tree of a certain height.
In this particular case, the values are:
So a completely filled B-tree of height 2 can store 185 + 185² = 34,410 entries, while with height 3 it can store 185+185²+185³ = 6,366,035 entries.
This means that if we assume that the root node is in memory, it’s possible to access 34,410 entries in at most 1+1 disk reads, while with at most 2+1 disk reads, it’s possible to access 6,366,035 entries.
Example for B⁺-tree
Assuming the previous values, we find the maximum integer p for internal nodes and leaves.
The number of data pointers and the number of entries for a completely filled B⁺-tree varies with the height. The table for a B⁺-tree of height two is presented below (the keys for the internal nodes are not associated with existing entries):
Note: the keys in the internal levels are not necessarily real.
Given the values previously calculated, the completely filled B⁺-tree of height 2 has these characteristics:
If we assume that the root node is in memory, it’s possible to access 291 * 227 = 66,057 entries in at most 1+1 disk reads.
If, instead, we consider the completely filled B⁺-tree of height 3:
If we assume that the root node is in memory, it’s possible to access 291² * 227 = 19,222,587 entries in at most 2+1 disk reads.
These examples show that a completely filled B⁺-tree can contain much more entries than a B-tree; comparing the previous examples, we have 34,410 vs 66,057 and 6,366,035 vs 19,222,587.
by the author.