This section analyzes the storage cost of a segment tree in a one-dimensional space.
A segment tree T on a set I of n intervals uses O(nlogn) storage.
- Lemma: Any interval [x, x′] of I is stored in the canonical set for at most two nodes at the same depth.
- Proof: Let v1, v2, v3 be the three nodes at the same depth, numbered from left to right; and let w be the parent node of v2. Suppose [x, x′] is stored at v1 and v3. This means that [x, x′] spans the whole interval from the left endpoint of Int(v1) to the right endpoint of Int(v3). Because v2 lies between v1 and v3, Int(w) must be contained in [x, x′]. Hence, [x, x′] will not be stored at v2.
- The set I has at most 4n + 1 elementary intervals. Because T is a binary balanced tree with at most 4n + 1 leaves, its height is O(logn). Since any interval is stored at most twice at a given depth of the tree, that the total amount of storage is O(nlogn)
This section describes the construction of a segment tree in a one-dimensional space.
A segment tree from the set of segments I, can be built as follows. First, the endpoints of the intervals in I are sorted. The elementary intervals are obtained from that. Then, a balanced binary tree is built on the elementary intervals, and for each node v it is determined the interval Int(v) it represents. It remains to compute the canonical subsets for the nodes. To achieve this, the intervals in I are inserted one by one into the segment tree. An interval X = [x, x′] can be inserted in a subtree rooted at T, using the following procedure :
The complete construction operation takes O(nlogn) time, being n the amount of segments in I.
- If Int(T) is contained in X then store X at T, and finish.
- If X intersects the canonical subset of the left child of T, then insert X in that child, recursively.
- If X intersects the canonical subset of the right child of T, then insert X in that child, recursively.
- Sorting the endpoints takes O(nlogn). Building a balanced binary tree from the sorted endpoints, takes linear time on n.
The insertion of an interval X = [x, x′] into the tree, costs O(logn).
- Proof: Visiting every node takes constant time (assuming that canonical subsets are stored in a simple data structure like a linked list). When we visit node v, we either store X at v, or Int(v) contains an endpoint of X. As proved above, an interval is stored at most twice at each level of the tree. There is also at most one node at every level whose corresponding interval contains x, and one node whose interval contains x′. So, at most four nodes per level are visited. Since there are O(logn) levels, the total cost of the insertion is O(logn)
This section describes the query operation of a segment tree in a one-dimensional space.
A query for a segment tree, receives a point qx, and retrieves a list of all the segments stored which contain the point qx.
Formally stated; given a node (subtree) v and a query point qx, the query can be done using the following algorithm:
In a segment tree that contains n intervals, those containing a given query point can be reported in O(logn + k) time, where k is the number of reported intervals.
Generalization for higher dimensions
The segment tree can be generalized to higher dimension spaces, in the form of multi-level segment trees. In higher dimension versions, the segment tree stores a collection of axis-parallel (hyper-)rectangles, and can retrieve the rectangles that contain a given query point. The structure uses O(nlogd-1n) storage, and answers queries in O(logdn).
The use of fractional cascading lowers the query time bound by a logarithmic factor. The use of the interval tree on the deepest level of associated structures lowers the storage bound with a logarithmic factor
- Report all the intervals in I(v).
If v is not a leaf:
If qx is in Int(left child of v) then
- Perform a query in the left child of v.
- Perform a query in the right child of v.
- If qx is in Int(left child of v) then
- Proof: The query algorithm visits one node per level of the tree, so O(logn) nodes in total. In the other hand, at a node v, the segments in I are reported in O(1 + kv) time, where kv is the number of intervals at node v, reported. The sum of all the kv for all nodes v visited, is k, the number of reported segments.