Saturday, July 27, 2013

Binary Search Tree Traversal

For this example, the text used was:
Data Structures and Algorithms:
Annotated Reference with Examples
First Edition
Copyright Granville Barnett, and Luca Del Tongo 2008.
Available at: dotnetslackers.com

In my last post, I left off without covering the tree traversals. There are four different tree traversals I plan to cover: Inorder, Preorder, Postorder and Breadth First. The first three (Inorder, Preorder, Postorder) are considered depth first traversals, while the last one is a breadth first (as the name implies). These algorithms are not too complex and writing out the code for them isn't too bad either.

Starting with the Inorder traversal, let's go over how it works. Inorder just gives you the tree in order of smallest to largest (dependent on what your value system is). This is accomplished by getting the value from the left node, then it's parent, then the right node. This is also done recursively so, we start at the left most node of the tree and work our way to the right most node. Let's take this tree for example:


The Inorder traversal for this tree would be as follows: 1, 3, 4, 6, 7, 8, 10, 13, 14. This might make more sense once you see the code:
void GetInOrder(TreeNode* parentNode) {
    if (parentNode != NULL) {
        GetInOrder(parentNode->left); // We continue to check the left node of whatever the parent node is
        std::cout << parentNode->data << std::endl; // At the left most node we print out (get/compare/whatever needs to be done) the value
        GetInOrder(parentNode->right); // Then we check the right most node.
    }
}
Once you understand how that function is working, Preorder and Postorder are quite similar (writing the functions for them) and easy to grasp as well.

Preorder gives you the tree starting at the root and continuing left to the the left most node, then the right nodes follow. That is, we check the root, head to the left subtree, check the root, head to the left subtree and continue this process until we have reached the left most node. Once at the left most subtree, we begin to check the right subtrees.

Postorder gives you the tree starting at the left most subtree, then right child, left child and root. From there it works it's way up, back at the root to the right child and its left most subtree. Using our example tree above, this will look like:
Preorder: 8, 3, 1, 6, 4, 7, 10, 14, 13
Postorder: 1, 4, 7, 6, 3, 13, 14, 10, 8

Here is the code for these two functions:
void GetPreOrder(TreeNode* parentNode) {
    if (parentNode != NULL) {
        std::cout << parentNode->data << std::endl;
        GetPreOrder(parentNode->left);
        GetPreOrder(parentNode->right);
    }
}
and
void GetPostOrder(TreeNode* parentNode) {
    if (parentNode != NULL) {
        GetPostOrder(parentNode->left);
        GetPostOrder(parentNode->right);
        std::cout << parentNode->data << std::endl;
    }
}
Lastly, let's go over the Breadth First traversal. Breadth first means we traverse the tree by level, where we visit every node on a level before going to a lower level from left to right. I suppose we could also go from right to left but, it seems to be more common to do left to right so that's how I wrote my implementation. Using the example tree above we get:
Breadth First: 8, 3, 10, 1, 6, 14, 4, 7, 13
The code for this function is a little more difficult than the others because it requires that we keep track of which node we are currently visiting.
void GetBreadthFirstOrder(TreeNode* parentNode) {
    std::queue q; //Create queue to keep track of tree.
    while (parentNode != NULL) {
        std::cout << parentNode->data << std::endl; //print which node we're on
        if (parentNode->left != NULL) {
            q.push(parentNode->left); // store the left node
        }
        if (parentNode->right != NULL) {
            q.push(parentNode->right); // store the right node
        }
        if (q.empty()) {  // if we have gone queued all the nodes, we're done
            parentNode = NULL;
        } else {
            parentNode = q.front(); // take node out of queue for printing (First In, First Out)
            q.pop(); // remove the node
        }
    }
}
These traversals generally run in linear time ($O(n)$) so, while they aren't as quick as the searching, inserting and deleting algorithms, they are not terrible. I left out a few algorithms the book covers and possibly others on purpose. Some, I think, are simple enough and some I may not understand well enough. For now that's all I intend to cover on binary search trees.

Any questions, comments and/or suggestions are appreciated.

Thursday, July 11, 2013

Binary Search Trees

For this example, the text used was:
Data Structures and Algorithms:
Annotated Reference with Examples
First Edition
Copyright Granville Barnett, and Luca Del Tongo 2008.
Available at: dotnetslackers.com

It's been a while since I've posted about data structures and I figured it might be worth going over another basic, but still important, one - Binary Search Trees. Similarly to my post on Linked Lists, I'll be working in C++ in an appropriately named header file, "BinarySearchTree.h".

Starting out with a struct to represent a node, we'll use this:
struct TreeNode {
    int data = NULL;
    TreeNode *left = NULL;
    TreeNode *right = NULL;
};

Before I get ahead of myself, let me explain what a Binary Search Tree (BST) is. A BST is a collections of nodes that may each point to a child node, where the left child's value is less than the parent node and the right child's value is greater than or equal to the parent node. In this example, I decided to set the value type to be an int; however, this can be whatever you choose. From here, let's outline the class structure:
class BinarySearchTree{ 
private:
    TreeNode *root;
// private functions will go here...
public:
    BinarySearchTree() { //constructor
        root = NULL;
    };
// public functions will go here...
};

The root is the very first node of the tree; when it is NULL, we do not have a tree. In this post, I plan to just cover searching, insertion and deletion; however, I plan to do a second post on this topic where I'll talk about the different tree traversals.

Begininning with searching, we'll create a public function, which the user will call and will return a boolean value, and a private function that actually finds the node. Here is the code for the public function:
bool Contains(int value) {
    if (findNode(root, value) == NULL) {
        return false;
    } else {
        return true;
    }
}

The private function, to find the node, can be used in multiple places so I figured it was appropriate to separate it from the contains function.
TreeNode *findNode(TreeNode* parentNode, int value) {
    TreeNode *result = (TreeNode*) malloc(sizeof(TreeNode)); //Must do this in order to return pointer.
    if (parentNode == NULL) { // if we reach an empty leaf node, value is not in tree.
        return NULL;
    } else if (parentNode->data == value) { //if we are at the node with value, return it.
        result = parentNode;
        return result;
    } else if (parentNode->data > value) { // if the value is less than the current node, the value is on the left side of the tree
        return findNode(parentNode->left, value);
    } else { // else we check the right side
        return findNode(parentNode->right, value);
    }
}

The search function is pretty straight forward, we start at a given node (in the contains function we call it with the root node) and we search from there for the given value. This search is done by comparing the value to current node, if they are equal we return it, otherwise we must continue down the path depending on whether or not the value is greater or less than the current node. This strategy continues until we either find the node or we end up pointing to a empty node and return NULL. This search can be done in logarithmic time which I explained in this post. A quick recap: because this tree is sorted and the nodes point to other nodes, we can traverse the tree by checking if each nodes value is less than, equal to, or greater than the value we're searching for and move from there. This allows us to only do $\log_2 n$ comparisons, where $n$ is the number of nodes in the tree. This is the great thing about BST's - not only searching, but insertion and deletion can be done in logarithmic time as well.

For insertion, we can use a similar tactic - let's begin with the public function.

void Insert(int value) {
    TreeNode *n = new TreeNode;
    n->data = value;
    if (root == NULL) {
    //This is the root of the tree
        this->root = n;
    } else {
        InsertNode(root, n);
    }
}

With the public function, we just want to check if there are other nodes in the tree, if there aren't we insert this value as the root node; otherwise, we call the private function with root as the parent node and a pointer to a node that has the value we'd like to insert.

void InsertNode(TreeNode* parentNode, TreeNode* node) { 
    TreeNode *n = node;
    if (n->data < parentNode->data) {
        if (parentNode->left == NULL) {
            parentNode->left = n;
        } else {
            InsertNode(parentNode->left, n);
        }
    } else {
        if (parentNode->right == NULL) {
            parentNode->right = n;
        } else {
            Ins,tNode(parentNode->right, n);
        }
    }
}

Again, very straight forward - if the value is less than the parent node, we go left, else we go right. Once we find a parent node that doesn't have a child on the left, or right depending on the value, we insert it there. This same concept applies to deletion except, we have to handle an extra case where we need to "rebuild" the tree. Because of this, removal only has a public function and reuses some private functions.

bool Remove (int value) { 
    TreeNode *node = findNode(root, value);
    if (node == NULL) {
        return false;
    } else {
        if (node->left == NULL && node->right == NULL) { //Node to remove has no children
            TreeNode *parent = findParentNode(root, value);
            if (parent == root && parent->data == value) {
                this->root = NULL;
            } else if (parent->left == node) {
                parent->left = NULL;
            } else {
                parent->right = NULL;
            }
        } else if (node->left != NULL && node->right == NULL) { //Node to remove has only a left child
            *node = *node->left;
        } else if (node->right != NULL && node->left == NULL) { //Node to remove has only a left child
            *node = *node->right;
        } else { //Node to remove has two children.
            TreeNode *tmp = node->left; // Assign left child to a temporary node.
            *node = *node->right; // Replace parent node with right child
            InsertNode(node, tmp); // Insert left node (temporary node) as child of right node
        }
        return true;
    }
}

This function could be void but, for testing purposes, let's make it bool. Before continuing on, we use a private function called 'findParentNode' that I would like to outline here:

TreeNode *findParentNode(TreeNode* parentNode, int value) {
    TreeNode *result = (TreeNode*) malloc(sizeof(TreeNode)); //Must do this in order to return pointer.
    result = parentNode;
    if (parentNode->left->data == value || parentNode->right->data == value || parentNode->data == value) {
        return result;
    } else if (parentNode->data > value) {
        return findParentNode(parentNode->left, value);
    } else {
        return findParentNode(parentNode->right, value);
    }
}

Moving on, we start by finding the node (using the findNode function above) that we will be removing. If we cannot find this node, there is nothing to remove, return false. Otherwise, there are 4 cases we must evaluate. The first case is when the node has no children; in this case we find the parent (using the private function above) of this node and set the corresponding child node to NULL. In the second and third cases, we have only one child; in this case we simply assign the child to the parent and therefore removing the parent. In the final case, our node for removal has two children; at this point we must find a way to "rebuild" the tree without the node for removal. Since we know the right child is greater than the parent node and therefore its sibling (the left child), we can safely replace the node for removal with it's right child and and insert the left node again (and all of its children) starting from the right child. This allows us to remove this node and still maintain the properties of a BST.

In the next post, I will talk about traversals and the functions used to accomplish those. Feel free to use this as a starting point for building your own BST. Any questions, comments and/or suggestions are appreciated.