Data Structures and Algorithms:
Annotated Reference with Examples
First Edition
Copyright Granville Barnett, and Luca Del Tongo 2008.
Available at: dotnetslackers.com
It's been a while since I've posted about data structures and I figured it might be worth going over another basic, but still important, one - Binary Search Trees. Similarly to my post on Linked Lists, I'll be working in C++ in an appropriately named header file, "BinarySearchTree.h".
Starting out with a struct to represent a node, we'll use this:
struct TreeNode {
int data = NULL;
TreeNode *left = NULL;
TreeNode *right = NULL;
};
Before I get ahead of myself, let me explain what a Binary Search Tree (BST) is. A BST is a collections of nodes that may each point to a child node, where the left child's value is less than the parent node and the right child's value is greater than or equal to the parent node. In this example, I decided to set the value type to be an int; however, this can be whatever you choose. From here, let's outline the class structure:
class BinarySearchTree{
private:
TreeNode *root;
// private functions will go here...
public:
BinarySearchTree() { //constructor
root = NULL;
};
// public functions will go here...
};
The root is the very first node of the tree; when it is NULL, we do not have a tree. In this post, I plan to just cover searching, insertion and deletion; however, I plan to do a second post on this topic where I'll talk about the different tree traversals.
Begininning with searching, we'll create a public function, which the user will call and will return a boolean value, and a private function that actually finds the node. Here is the code for the public function:
bool Contains(int value) {
if (findNode(root, value) == NULL) {
return false;
} else {
return true;
}
}
The private function, to find the node, can be used in multiple places so I figured it was appropriate to separate it from the contains function.
TreeNode *findNode(TreeNode* parentNode, int value) {
TreeNode *result = (TreeNode*) malloc(sizeof(TreeNode)); //Must do this in order to return pointer.
if (parentNode == NULL) { // if we reach an empty leaf node, value is not in tree.
return NULL;
} else if (parentNode->data == value) { //if we are at the node with value, return it.
result = parentNode;
return result;
} else if (parentNode->data > value) { // if the value is less than the current node, the value is on the left side of the tree
return findNode(parentNode->left, value);
} else { // else we check the right side
return findNode(parentNode->right, value);
}
}
The search function is pretty straight forward, we start at a given node (in the contains function we call it with the root node) and we search from there for the given value. This search is done by comparing the value to current node, if they are equal we return it, otherwise we must continue down the path depending on whether or not the value is greater or less than the current node. This strategy continues until we either find the node or we end up pointing to a empty node and return NULL. This search can be done in logarithmic time which I explained in this post. A quick recap: because this tree is sorted and the nodes point to other nodes, we can traverse the tree by checking if each nodes value is less than, equal to, or greater than the value we're searching for and move from there. This allows us to only do $\log_2 n$ comparisons, where $n$ is the number of nodes in the tree. This is the great thing about BST's - not only searching, but insertion and deletion can be done in logarithmic time as well.
For insertion, we can use a similar tactic - let's begin with the public function.
void Insert(int value) {
TreeNode *n = new TreeNode;
n->data = value;
if (root == NULL) {
//This is the root of the tree
this->root = n;
} else {
InsertNode(root, n);
}
}
With the public function, we just want to check if there are other nodes in the tree, if there aren't we insert this value as the root node; otherwise, we call the private function with root as the parent node and a pointer to a node that has the value we'd like to insert.
void InsertNode(TreeNode* parentNode, TreeNode* node) {
TreeNode *n = node;
if (n->data < parentNode->data) {
if (parentNode->left == NULL) {
parentNode->left = n;
} else {
InsertNode(parentNode->left, n);
}
} else {
if (parentNode->right == NULL) {
parentNode->right = n;
} else {
Ins,tNode(parentNode->right, n);
}
}
}
Again, very straight forward - if the value is less than the parent node, we go left, else we go right. Once we find a parent node that doesn't have a child on the left, or right depending on the value, we insert it there. This same concept applies to deletion except, we have to handle an extra case where we need to "rebuild" the tree. Because of this, removal only has a public function and reuses some private functions.
bool Remove (int value) {
TreeNode *node = findNode(root, value);
if (node == NULL) {
return false;
} else {
if (node->left == NULL && node->right == NULL) { //Node to remove has no children
TreeNode *parent = findParentNode(root, value);
if (parent == root && parent->data == value) {
this->root = NULL;
} else if (parent->left == node) {
parent->left = NULL;
} else {
parent->right = NULL;
}
} else if (node->left != NULL && node->right == NULL) { //Node to remove has only a left child
*node = *node->left;
} else if (node->right != NULL && node->left == NULL) { //Node to remove has only a left child
*node = *node->right;
} else { //Node to remove has two children.
TreeNode *tmp = node->left; // Assign left child to a temporary node.
*node = *node->right; // Replace parent node with right child
InsertNode(node, tmp); // Insert left node (temporary node) as child of right node
}
return true;
}
}
This function could be void but, for testing purposes, let's make it bool. Before continuing on, we use a private function called 'findParentNode' that I would like to outline here:
TreeNode *findParentNode(TreeNode* parentNode, int value) {
TreeNode *result = (TreeNode*) malloc(sizeof(TreeNode)); //Must do this in order to return pointer.
result = parentNode;
if (parentNode->left->data == value || parentNode->right->data == value || parentNode->data == value) {
return result;
} else if (parentNode->data > value) {
return findParentNode(parentNode->left, value);
} else {
return findParentNode(parentNode->right, value);
}
}
Moving on, we start by finding the node (using the findNode function above) that we will be removing. If we cannot find this node, there is nothing to remove, return false. Otherwise, there are 4 cases we must evaluate. The first case is when the node has no children; in this case we find the parent (using the private function above) of this node and set the corresponding child node to NULL. In the second and third cases, we have only one child; in this case we simply assign the child to the parent and therefore removing the parent. In the final case, our node for removal has two children; at this point we must find a way to "rebuild" the tree without the node for removal. Since we know the right child is greater than the parent node and therefore its sibling (the left child), we can safely replace the node for removal with it's right child and and insert the left node again (and all of its children) starting from the right child. This allows us to remove this node and still maintain the properties of a BST.
In the next post, I will talk about traversals and the functions used to accomplish those. Feel free to use this as a starting point for building your own BST. Any questions, comments and/or suggestions are appreciated.
No comments:
Post a Comment