Binary Search Trees

A symbol table is an abstract data type that associates a value with a key.

The primary operations of a symbol table include the insertion of a value, and searching the symbol table for a given value.

Some examples of applications where one must search through a symbol table, along with the key sought and the resulting value found include:

application purpose of search key value
dictionary find definition word definition
book index find relevant pages term list of page numbers
file share find song to download name of song computer ID
account management process transactions account number transaction details
web search find relevant web pages keyword list of page names
compiler find type and value variable name type and value

One could imagine implementing an elementary symbol table using an unordered linked list, where each node stored three things: a key, a value, and a reference to the next node. However, due to its unordered nature, one would need to traverse the entire list to search for a given key (a process that is $O(n)$. Similarly, to insert a value attached to a particular key (replacing the old value if the key already exists in the symbol table), requires a process that is $O(n)$.

If we switch to storing the necessary values using an ordered array, we can dramatically decrease the cost of searching by employing a binary search which is $O(\ln n)$. Unfortunately, the cost of insertion remains the same, $O(n)$.

There is, however, another and better way...

A binary search tree (BST) provides a way to implement a symbol table that combines the flexibility of insertion in linked lists with the efficiency of searching in an ordered array. BSTs are a dynamic data structure that is fast to both search and insert.

Recall how linked lists are built from nodes that each contain a reference to some other node. A binary search tree is similarly constructed -- except that the nodes in a BST contains two references per node, as suggested by the arrows drawn below -- in addition to a key value and some associated data.

Describing Trees

The vocabulary used to describe trees in computer science employs a strange mixture of imagery...

If a node A references nodes B and C, then B and C are called A's children. In a binary tree, there are two children (possibly null) for any one node. One of these is referred to as the left child, while the other is the right child. Naturally, if nodes B and C are children of node A, then node A is referred to as the parent of nodes B and C. Note, every node in a tree, with the exception of the top-most node, has one and only one parent.

Switching metaphors, nodes without children (i.e., both references they contain are null) are called leaves, while the top-most node of the tree is called the root. Note, for any given tree there is only one root. Curiously, binary trees are always drawn upside from their wooden counter-parts in nature -- with their leaves at the "bottom" and the root at the "top".

Now re-envisioning the tree as a network of roads, we refer to checking the value of a node as visiting that node, and describe the sequence of nodes that must be visited as one travels from the root to a given node, as the path to that node. Note, each path in a tree is unique.

The level of a particular node is the length of the aforementioned unique path to that node from the root. This of course makes the root at level 0. The maximum level in a tree is called its height. For a reasonably balanced tree of $n$ nodes, the height is $O(\ln n)$.

Lastly, as shown above, we describe a node taken together with all of its descendants (i.e., its children, its children's children, etc.) a subtree. Recalling that every node contains a key and value -- If every node's key is both larger than all keys in its left subtree and smaller than all keys in its right subtree, we say the binary tree is in symmetric order, and consequently is called a binary search tree.