A *symbol table* is an abstract data type that associates a value with a key.

The primary operations of a symbol table include the insertion of a value, and searching the symbol table for a given value.

Some examples of applications where one must search through a symbol table, along with the key sought and the resulting value found include:

application | purpose of search | key | value |
---|---|---|---|

dictionary | find definition | word | definition |

book index | find relevant pages | term | list of page numbers |

file share | find song to download | name of song | computer ID |

account management | process transactions | account number | transaction details |

web search | find relevant web pages | keyword | list of page names |

compiler | find type and value | variable name | type and value |

One could imagine implementing an elementary symbol table using an unordered linked list, where each node stored three things: a key, a value, and a reference to the next node. However, due to its unordered nature, one would need to traverse the entire list to search for a given key (a process that is $O(n)$. Similarly, to insert a value attached to a particular key (replacing the old value if the key already exists in the symbol table), requires a process that is $O(n)$.

If we switch to storing the necessary values using an ordered array, we can dramatically decrease the cost of searching by employing a binary search which is $O(\ln n)$. Unfortunately, the cost of insertion remains the same, $O(n)$.

There is, however, another and better way...

A *binary search tree* (BST) provides a way to implement a symbol table that combines the flexibility of insertion in linked lists with the efficiency of searching in an ordered array. BSTs are a dynamic data structure that is fast to both search and insert.

Recall how linked lists are built from nodes that each contain a reference to some other node. A binary search tree is similarly constructed -- except that the nodes in a BST contains *two* references per node, as suggested by the arrows drawn below -- in addition to a key value and some associated data.

The vocabulary used to describe trees in computer science employs a strange mixture of imagery...

If a node A references nodes B and C, then B and C are called A's *children*. In a binary tree, there are two children (possibly null) for any one node. One of these is referred to as the *left child*, while the other is the *right child*. Naturally, if nodes B and C are children of node A, then node A is referred to as the *parent* of nodes B and C. Note, every node in a tree, with the exception of the top-most node, has one and only one parent.

Switching metaphors, nodes without children (i.e., both references they contain are `null`

) are called *leaves*, while the top-most node of the tree is called the *root*. Note, for any given tree there is only one root. Curiously, binary trees are always drawn upside from their wooden counter-parts in nature -- with their leaves at the "bottom" and the root at the "top".

Now re-envisioning the tree as a network of roads, we refer to checking the value of a node as *visiting* that node, and describe the sequence of nodes that must be visited as one travels from the root to a given node, as the *path* to that node. Note, each path in a tree is unique.

The *level* of a particular node is the length of the aforementioned unique path to that node from the root. This of course makes the root at level 0. The maximum level in a tree is called its *height*. For a reasonably balanced tree of $n$ nodes, the height is $O(\ln n)$.

Lastly, as shown above, we describe a node taken together with all of its descendants (i.e., its children, its children's children, etc.) a *subtree*. Recalling that every node contains a key and value -- If every node's key is both larger than all keys in its left subtree and smaller than all keys in its right subtree, we say the binary tree is in *symmetric order*, and consequently is called a *binary search tree*.