Approaching Certainty

Tuesday, August 6, 2013

Pythagorean Triplets

A Pythagorean triplet is a set of three natural numbers, $a$ $b$ $c$, for which, $a^2$ + $b^2$ = $c^2$. Even more interesting, a Primitive Pythagorean Triplet is a is considered primitive if, and only if, $a, b,$ and $c$ share no common divisor. That being said, a non-primitive pythagorean triple would then satisfy the rule $(ka)^2 + (kb)^2 = (kc)^2$ where $k > 1$ and $k$ is a whole number. So, given a primitive triple, we can use it to generate the non-primitive's.

I thought it would be a fun project to create a function to generate the primitive Pythagorean triples. And, after recently talking about the Binary Search Tree, I thought using a ternary tree to show this might be a good expansion on the topic.

There were 3 different method's I used to create this tree. We'll start with using the matrices discovered by B. Berggren. These three matrices are as follows:
$$A = \left| \begin{array}{ccc} 1 & -2 & 2 \\ 2 & -1 & 2 \\ 2 & -2 & 3 \end{array} \right| B = \left| \begin{array}{ccc} 1 & 2 & 2 \\ 2 & 1 & 2 \\ 2 & 2 & 3 \end{array} \right| C = \left| \begin{array}{ccc} -1 & 2 & 2 \\ -2 & 1 & 2 \\ -2 & 2 & 3 \end{array} \right|$$ If we multiply these three matrices by a column vector whose components form a Pythagorean triple, more specifically a primitive Pythagorean triplet, then the result is another column vector whose components are a different primitive Pythagorean triple. With these matices we can create three more primitive Pythagorean triple "children" from any set of primitive triples. Starting with a root of $(3, 4, 5)$ and continuing this process with each resulting set, eventually, all primitive Pythagorean triples are generated with no repetitions. This is represented well in a ternary tree where we would just repeat the transformation in each subtree. (Wiki)

For this particular project, I decided to work in C# so, beginning with the node class, here is the code that represents generating this tree:

public class node {
    private int a;
    private int b;
    private int c;
    private node left;
    private node middle;
    private node right;

    public node(int x, int y, int z) {
        a = x;
        b = y;
        c = z;
    }

    public int getValue(string identifier) {
        int value = 0;
        if (identifier == "a")
        { value = a; }
        else if (identifier == "b")
        { value = b; }
        else if (identifier == "c")
        { value = c; }
        return value;
    }

    public void setLeft(node newNode) {
        left = newNode;
    }

    public void setMiddle(node newNode)
    {
        middle = newNode;
    }

    public void setRight(node newNode)
    {
        right = newNode;
    }
}

And, here is the function to generate the tree itself. I decided to make the tree stop generating when all three of the values were greater than 500 but, it could go on indefinitely.

public static node createPrimitiveTreeB(int x, int y, int z) {
    node root = new node(x, y, z);
    if (root.getValue("a") > 500 && root.getValue("b") > 500 && root.getValue("c") > 500) {  // Stop the tree eventually
        return root;
    } else {
        //Berggren's matrices
        int a1, b1, c1, a2, b2, c2, a3, b3, c3;

        //Left Child
        a1 = (-1 * root.getValue("a")) + (2 * root.getValue("b")) + (2 * root.getValue("c"));
        b1 = (-2 * root.getValue("a")) + root.getValue("b") + (2 * root.getValue("c"));
        c1 = (-2 * root.getValue("a")) + (2 * root.getValue("b")) + (3 * root.getValue("c"));

        root.setLeft(createPrimitiveTreeB(a1, b1, c1));

        //Middle Child
        a2 = root.getValue("a") + (2 * root.getValue("b")) + (2 * root.getValue("c"));
        b2 = (2 * root.getValue("a")) + root.getValue("b") + (2 * root.getValue("c"));
        c2 = (2 * root.getValue("a")) + (2 * root.getValue("b")) + (3 * root.getValue("c"));

        root.setMiddle(createPrimitiveTreeB(a2, b2, c2));

        //Right Child
        a3 = (root.getValue("a")) + (-2 * root.getValue("b")) + (2 * root.getValue("c"));
        b3 = (2 * root.getValue("a")) + (-1 * root.getValue("b")) + (2 * root.getValue("c"));
        c3 = (2 * root.getValue("a")) + (-2 * root.getValue("b")) + (3 * root.getValue("c"));

        root.setRight(createPrimitiveTreeB(a3, b3, c3));

        return root;
    }
}

Similarly to Berggren, other mathematicians (Price in this case) came up with a set of matrices for generating the triples. Price's matrices are as follows:
$$A = \left| \begin{array}{ccc} 2 & 1 & -1 \\ 2 & 2 & 2 \\ 2 & -1 & 3 \end{array} \right| B = \left| \begin{array}{ccc} -2 & 1 & 1 \\ 2 & -2 & 2 \\ 2 & 1 & 3 \end{array} \right| C = \left| \begin{array}{ccc} -2 & 1 & 1 \\ 2 & -2 & 2 \\ 2 & 1 & 3 \end{array} \right|$$ Operating under the same concept, I didn't need to make much changes to the function in order to generate the tree with these matrices. These transformations should generate the same tree. For the code below, I just show the modifications to the "createPrimitiveTreeB" function above (I renamed this one to "createPrimitiveTreeP").

//Price's matrices
int a1, b1, c1, a2, b2, c2, a3, b3, c3;

// Left Child
a1 = (2 * root.getValue("a")) + (root.getValue("b")) + (-1 * root.getValue("c"));
b1 = (-2 * root.getValue("a")) + (2 * root.getValue("b")) + (2 * root.getValue("c"));
c1 = (-2 * root.getValue("a")) + root.getValue("b") + (3 * root.getValue("c"));

root.setLeft(createPrimitiveTreeP(a1, b1, c1));

// Middle Child
a2 = (2 * root.getValue("a")) + root.getValue("b") + root.getValue("c");
b2 = (2 * root.getValue("a")) + (-2 * root.getValue("b")) + (2 * root.getValue("c"));
c2 = (2 * root.getValue("a")) + (-1 * root.getValue("b")) + (3 * root.getValue("c"));

root.setMiddle(createPrimitiveTreeP(a2, b2, c2));

// Right Child
a3 = (2 * root.getValue("a")) + (-1 * root.getValue("b")) + root.getValue("c");
b3 = (2 * root.getValue("a")) + (2 * root.getValue("b")) + (2 * root.getValue("c"));
c3 = (2 * root.getValue("a")) + root.getValue("b") + (3 * root.getValue("c"));

root.setRight(createPrimitiveTreeP(a3, b3, c3));

For the final method of generating the tree, and in my opinion the most fun, we will be using the Fibonacci Sequence. If you don't know what the Fibonacci sequence is, I talk about it in this post.

To do this, we follow a simple method of generating the Fibonacci numbers and retrieving the triples from these numbers. This is done by creating, what we'll call, a "Fibonacci Box". This matrix consists of four positive integers $q$, $q'$, $p$, $p'$ where $q$ and $q'$ are any two coprime numbers. In order to get $p$ and $p'$, we follow the Fibonacci Rule where $q' + q = p$ and $q + p = p'$, giving us a generalized Fibonacci sequence.
$$\left| \begin{array}{cc} q & q' \\ p & p' \end{array} \right|$$ From this initial matrix, we can generate three "children" by using this method:
$$\left| \begin{array}{cc} q & q' \\ p & p' \end{array} \right| \Rightarrow \left| \begin{array}{cc} (p' - q') & q' \\ p' & (2p' - q') \end{array} \right|, \left| \begin{array}{cc} p' & q' \\ (q' + p') & (2p' + q') \end{array} \right|, \left| \begin{array}{cc} q' & p' \\ (p' + q') & (2q' + p') \end{array} \right|$$ Now, from these matrices, we need to create the pythagorean triplets themselves. Given the formula $a = q'p', b = 2qp, c= pp' - qq'$, we can generate a set of triplets per Fibonacci Box. Just like Berggren and Price's matrices, we will be generating all the triples with no repetitions when using this method. For this tree, I thought it would be appropriate to create a new type of node, this gives us a new class:

public class boxNode {
    private int[,] fibBox = new int[2,2];
    private int a;
    private int b;
    private int c;
    private boxNode left;
    private boxNode middle;
    private boxNode right;

    public boxNode(int[,] box)
    {
        fibBox = box;
        a = fibBox[0, 1] * fibBox[1, 1];
        b = 2 * fibBox[0, 0] * fibBox[1, 0];
        c = (fibBox[1, 0] * fibBox[1, 1]) - (fibBox[0, 0] * fibBox[0, 1]);
    }

    public int getValue(string identifier)
    {
        int value = 0;
        if (identifier == "a")
        { value = fibBox[0,1] * fibBox[1,1]; }
        else if (identifier == "b")
        { value = 2 * fibBox[0, 0] * fibBox[1, 0]; }
        else if (identifier == "c")
        { value = (fibBox[1, 0] * fibBox[1, 1]) - (fibBox[0, 0] * fibBox[0, 1]); }
        return value;
    }

    public int[,] getBoxValue() {
        return fibBox;
    }

    public void setLeft(boxNode newNode) {
        left = newNode;
    }

    public void setMiddle(boxNode newNode) {
        middle = newNode;
    }

    public void setRight(boxNode newNode) {
        right = newNode;
    }
}

And, a function using this method to generate the entire tree.

public static boxNode createPrimitiveTreeF(int[,] fibBox) {
    boxNode root = new boxNode(fibBox);
    if (root.getValue("a") > 500 && root.getValue("b") > 500 && root.getValue("c") > 500) {
        return root;
    } else {
        //Using Fibonacci Box
        int[,] parentBox = root.getBoxValue();
        int[,] fibBox1 = new int[2, 2];
        int[,] fibBox2 = new int[2, 2];
        int[,] fibBox3 = new int[2, 2];

        // Box Properties: [0,0] = [1,0] - [0, 1]; [1, 1] = [0, 0] + [1, 0]
        fibBox1[0, 0] = parentBox[1, 1] - parentBox[0, 1];
        fibBox1[0, 1] = parentBox[0, 1];
        fibBox1[1, 0] = parentBox[1, 1];
        fibBox1[1, 1] = (parentBox[1, 1] - parentBox[0, 1]) + parentBox[1, 1];

        root.setLeft(createPrimitiveTreeF(fibBox1));

        // Box Properties: [1, 0] = [0, 1] + [0, 0]; [1, 1] = [0, 0] + [1, 0]
        fibBox2[0, 0] = parentBox[1, 1];
        fibBox2[0, 1] = parentBox[0, 1];
        fibBox2[1, 0] = parentBox[0, 1] + parentBox[1, 1];
        fibBox2[1, 1] = parentBox[1, 1] + (parentBox[0, 1] + parentBox[1, 1]);

        root.setMiddle(createPrimitiveTreeF(fibBox2));

        // Box Properties: [1, 0] = [0, 1] + [0, 0]; [1, 1] = [0, 0] + [1, 0]
        fibBox3[0, 0] = parentBox[0, 1];
        fibBox3[0, 1] = parentBox[1, 1];
        fibBox3[1, 0] = parentBox[1, 1] + parentBox[0, 1];
        fibBox3[1, 1] = parentBox[0, 1] + (parentBox[1, 1] + parentBox[0, 1]);

        root.setRight(createPrimitiveTreeF(fibBox3));

        return root;
    }
}

We can test the functions with the code below, putting a breakpoint at "ReadLine" and examining the local variables.

static void Main(string[] args) {
    node berggrenTree = createPrimitiveTreeB(3, 4, 5); // root of tree must be initialized with first primitive triple
    node priceTree = createPrimitiveTreeP(3, 4, 5); // root of tree must be initialized with first primitive triple
    int[,] initFibBox = new int[2,2]; 
    initFibBox[0, 0] = 1;
    initFibBox[0, 1] = 1;
    initFibBox[1, 0] = 2;
    initFibBox[1, 1] = 3;
    boxNode fibTree = createPrimitiveTreeF(fibBox); // root of tree must be initialized with beginning of Fibonacci sequence
    Console.ReadLine();
}

After running this code, you will be able to see a ternary tree and the primitive Pythagorean triples that it generates in your local variables. I thought this might be an interesting topic as it gives more insight into using trees, as well as a look at implementing matrices and the Fibonacci sequence in a useful way (provided that generating the primitive Pythagorean triplets is useful to you). I think it's especially interesting to see how the Fibonacci sequence can be used to create these triples and it might be worth playing around with your initial Fibonacci Box to see the different trees you can create. You can use any numbers to create the initial box as long as it satisfies these conditions: $q'$ and $p'$ are both odd, either $q$ or $p$ is even and $q < p$ , $q' < p'$. Of course being a "Fibonacci" box, the values must maintain the properties of the sequence as stated above.

Any questions, comments and/or suggestions are appreciated.

Saturday, July 27, 2013

Binary Search Tree Traversal

For this example, the text used was:
Data Structures and Algorithms:
Annotated Reference with Examples
First Edition
Copyright Granville Barnett, and Luca Del Tongo 2008.
Available at: dotnetslackers.com

In my last post, I left off without covering the tree traversals. There are four different tree traversals I plan to cover: Inorder, Preorder, Postorder and Breadth First. The first three (Inorder, Preorder, Postorder) are considered depth first traversals, while the last one is a breadth first (as the name implies). These algorithms are not too complex and writing out the code for them isn't too bad either.

Starting with the Inorder traversal, let's go over how it works. Inorder just gives you the tree in order of smallest to largest (dependent on what your value system is). This is accomplished by getting the value from the left node, then it's parent, then the right node. This is also done recursively so, we start at the left most node of the tree and work our way to the right most node. Let's take this tree for example:

The Inorder traversal for this tree would be as follows: 1, 3, 4, 6, 7, 8, 10, 13, 14. This might make more sense once you see the code:

void GetInOrder(TreeNode* parentNode) {
    if (parentNode != NULL) {
        GetInOrder(parentNode->left); // We continue to check the left node of whatever the parent node is
        std::cout << parentNode->data << std::endl; // At the left most node we print out (get/compare/whatever needs to be done) the value
        GetInOrder(parentNode->right); // Then we check the right most node.
    }
}

Once you understand how that function is working, Preorder and Postorder are quite similar (writing the functions for them) and easy to grasp as well.

Preorder gives you the tree starting at the root and continuing left to the the left most node, then the right nodes follow. That is, we check the root, head to the left subtree, check the root, head to the left subtree and continue this process until we have reached the left most node. Once at the left most subtree, we begin to check the right subtrees.

Postorder gives you the tree starting at the left most subtree, then right child, left child and root. From there it works it's way up, back at the root to the right child and its left most subtree. Using our example tree above, this will look like:
Preorder: 8, 3, 1, 6, 4, 7, 10, 14, 13
Postorder: 1, 4, 7, 6, 3, 13, 14, 10, 8

Here is the code for these two functions:

void GetPreOrder(TreeNode* parentNode) {
    if (parentNode != NULL) {
        std::cout << parentNode->data << std::endl;
        GetPreOrder(parentNode->left);
        GetPreOrder(parentNode->right);
    }
}

and

void GetPostOrder(TreeNode* parentNode) {
    if (parentNode != NULL) {
        GetPostOrder(parentNode->left);
        GetPostOrder(parentNode->right);
        std::cout << parentNode->data << std::endl;
    }
}

Lastly, let's go over the Breadth First traversal. Breadth first means we traverse the tree by level, where we visit every node on a level before going to a lower level from left to right. I suppose we could also go from right to left but, it seems to be more common to do left to right so that's how I wrote my implementation. Using the example tree above we get:
Breadth First: 8, 3, 10, 1, 6, 14, 4, 7, 13
The code for this function is a little more difficult than the others because it requires that we keep track of which node we are currently visiting.

void GetBreadthFirstOrder(TreeNode* parentNode) {
    std::queue q; //Create queue to keep track of tree.
    while (parentNode != NULL) {
        std::cout << parentNode->data << std::endl; //print which node we're on
        if (parentNode->left != NULL) {
            q.push(parentNode->left); // store the left node
        }
        if (parentNode->right != NULL) {
            q.push(parentNode->right); // store the right node
        }
        if (q.empty()) {  // if we have gone queued all the nodes, we're done
            parentNode = NULL;
        } else {
            parentNode = q.front(); // take node out of queue for printing (First In, First Out)
            q.pop(); // remove the node
        }
    }
}

These traversals generally run in linear time ($O(n)$) so, while they aren't as quick as the searching, inserting and deleting algorithms, they are not terrible. I left out a few algorithms the book covers and possibly others on purpose. Some, I think, are simple enough and some I may not understand well enough. For now that's all I intend to cover on binary search trees.

Any questions, comments and/or suggestions are appreciated.

Thursday, July 11, 2013

Binary Search Trees

For this example, the text used was:
Data Structures and Algorithms:
Annotated Reference with Examples
First Edition
Copyright Granville Barnett, and Luca Del Tongo 2008.
Available at: dotnetslackers.com

It's been a while since I've posted about data structures and I figured it might be worth going over another basic, but still important, one - Binary Search Trees. Similarly to my post on Linked Lists, I'll be working in C++ in an appropriately named header file, "BinarySearchTree.h".

Starting out with a struct to represent a node, we'll use this:

struct TreeNode {
    int data = NULL;
    TreeNode *left = NULL;
    TreeNode *right = NULL;
};

Before I get ahead of myself, let me explain what a Binary Search Tree (BST) is. A BST is a collections of nodes that may each point to a child node, where the left child's value is less than the parent node and the right child's value is greater than or equal to the parent node. In this example, I decided to set the value type to be an int; however, this can be whatever you choose. From here, let's outline the class structure:

class BinarySearchTree{ 
private:
    TreeNode *root;
// private functions will go here...
public:
    BinarySearchTree() { //constructor
        root = NULL;
    };
// public functions will go here...
};

The root is the very first node of the tree; when it is NULL, we do not have a tree. In this post, I plan to just cover searching, insertion and deletion; however, I plan to do a second post on this topic where I'll talk about the different tree traversals.

Begininning with searching, we'll create a public function, which the user will call and will return a boolean value, and a private function that actually finds the node. Here is the code for the public function:

bool Contains(int value) {
    if (findNode(root, value) == NULL) {
        return false;
    } else {
        return true;
    }
}

The private function, to find the node, can be used in multiple places so I figured it was appropriate to separate it from the contains function.

TreeNode *findNode(TreeNode* parentNode, int value) {
    TreeNode *result = (TreeNode*) malloc(sizeof(TreeNode)); //Must do this in order to return pointer.
    if (parentNode == NULL) { // if we reach an empty leaf node, value is not in tree.
        return NULL;
    } else if (parentNode->data == value) { //if we are at the node with value, return it.
        result = parentNode;
        return result;
    } else if (parentNode->data > value) { // if the value is less than the current node, the value is on the left side of the tree
        return findNode(parentNode->left, value);
    } else { // else we check the right side
        return findNode(parentNode->right, value);
    }
}

The search function is pretty straight forward, we start at a given node (in the contains function we call it with the root node) and we search from there for the given value. This search is done by comparing the value to current node, if they are equal we return it, otherwise we must continue down the path depending on whether or not the value is greater or less than the current node. This strategy continues until we either find the node or we end up pointing to a empty node and return NULL. This search can be done in logarithmic time which I explained in this post. A quick recap: because this tree is sorted and the nodes point to other nodes, we can traverse the tree by checking if each nodes value is less than, equal to, or greater than the value we're searching for and move from there. This allows us to only do $\log_2 n$ comparisons, where $n$ is the number of nodes in the tree. This is the great thing about BST's - not only searching, but insertion and deletion can be done in logarithmic time as well.

For insertion, we can use a similar tactic - let's begin with the public function.

void Insert(int value) {
    TreeNode *n = new TreeNode;
    n->data = value;
    if (root == NULL) {
    //This is the root of the tree
        this->root = n;
    } else {
        InsertNode(root, n);
    }
}

With the public function, we just want to check if there are other nodes in the tree, if there aren't we insert this value as the root node; otherwise, we call the private function with root as the parent node and a pointer to a node that has the value we'd like to insert.

void InsertNode(TreeNode* parentNode, TreeNode* node) { 
    TreeNode *n = node;
    if (n->data < parentNode->data) {
        if (parentNode->left == NULL) {
            parentNode->left = n;
        } else {
            InsertNode(parentNode->left, n);
        }
    } else {
        if (parentNode->right == NULL) {
            parentNode->right = n;
        } else {
            Ins,tNode(parentNode->right, n);
        }
    }
}

Again, very straight forward - if the value is less than the parent node, we go left, else we go right. Once we find a parent node that doesn't have a child on the left, or right depending on the value, we insert it there. This same concept applies to deletion except, we have to handle an extra case where we need to "rebuild" the tree. Because of this, removal only has a public function and reuses some private functions.

bool Remove (int value) { 
    TreeNode *node = findNode(root, value);
    if (node == NULL) {
        return false;
    } else {
        if (node->left == NULL && node->right == NULL) { //Node to remove has no children
            TreeNode *parent = findParentNode(root, value);
            if (parent == root && parent->data == value) {
                this->root = NULL;
            } else if (parent->left == node) {
                parent->left = NULL;
            } else {
                parent->right = NULL;
            }
        } else if (node->left != NULL && node->right == NULL) { //Node to remove has only a left child
            *node = *node->left;
        } else if (node->right != NULL && node->left == NULL) { //Node to remove has only a left child
            *node = *node->right;
        } else { //Node to remove has two children.
            TreeNode *tmp = node->left; // Assign left child to a temporary node.
            *node = *node->right; // Replace parent node with right child
            InsertNode(node, tmp); // Insert left node (temporary node) as child of right node
        }
        return true;
    }
}

This function could be void but, for testing purposes, let's make it bool. Before continuing on, we use a private function called 'findParentNode' that I would like to outline here:

TreeNode *findParentNode(TreeNode* parentNode, int value) {
    TreeNode *result = (TreeNode*) malloc(sizeof(TreeNode)); //Must do this in order to return pointer.
    result = parentNode;
    if (parentNode->left->data == value || parentNode->right->data == value || parentNode->data == value) {
        return result;
    } else if (parentNode->data > value) {
        return findParentNode(parentNode->left, value);
    } else {
        return findParentNode(parentNode->right, value);
    }
}

Moving on, we start by finding the node (using the findNode function above) that we will be removing. If we cannot find this node, there is nothing to remove, return false. Otherwise, there are 4 cases we must evaluate. The first case is when the node has no children; in this case we find the parent (using the private function above) of this node and set the corresponding child node to NULL. In the second and third cases, we have only one child; in this case we simply assign the child to the parent and therefore removing the parent. In the final case, our node for removal has two children; at this point we must find a way to "rebuild" the tree without the node for removal. Since we know the right child is greater than the parent node and therefore its sibling (the left child), we can safely replace the node for removal with it's right child and and insert the left node again (and all of its children) starting from the right child. This allows us to remove this node and still maintain the properties of a BST.

In the next post, I will talk about traversals and the functions used to accomplish those. Feel free to use this as a starting point for building your own BST. Any questions, comments and/or suggestions are appreciated.

Saturday, June 29, 2013

Logic and Emotions

More often than not, I find that people tend to separate logic from emotions. Those people might say things like:

“The heart is a strange beast and not ruled by logic.”
- Maria V. Snyder

I've seen many quotes like this that seem to imply there is some sort of disconnect. I strongly disagree with this sentiment and hopefully the following explanation is convincing. I suppose, this opening to the post, in itself, will support its content but, we shall see.

The idea that logic fuels emotions seems to be obvious - I would speculate that most people are already aware of this; or maybe they are but, the idea is completely wrong - if this is the case, I do hope someone corrects me. What seems more likely, is that many people know this but do not truly understand it. I may have to do a future post on knowledge vs. understanding but, I digress. Let's explore this seemingly rudimentary idea, beginning with a quote.

“The degree of one's emotions varies inversely with one's knowledge of the facts.”
- Bertrand Russell

That pretty much sums it all up but, let's continue on. This concept can be used to derive most (possibly all) behaviors humans exhibit but, in order to properly explain it, it's important to know what logic and emotion are, thus, initially defining the main terms only seems natural.

Positive - That which causes progression, inclusion or gain
Negative - That which causes regression, exclusion or loss
Feelings - Positive or negative stimulations
Wants/Desires - Longing for objects or ideas, that create feelings in the subject
Emotions - Positive or negative mental reactions to acquisition or deprivation of Wants/Desires

Emotions can cause feelings but, they are not the feelings themselves

Logic - Interrelation or sequence of facts or events when seen as inevitable or predictable

With these terms defined, let's go through the process that seems to connect logic to emotions and our actions in general.

From the moment a person is born, they begin to experience all sorts of things, this continues until the person dies. Experiences lead to feelings, whether positive or negative is entirely dependent on the individual. Some of these feelings may cause an increase in production in dopamine, the brains "pleasure chemical", some may cause a disconnect or a loss of stimulation in regions of the brain that need/should be stimulated, and others may cause different positive or negative effects. Regardless of what feeling an experience gives us, it will cause us to begin to want it more or try to avoid it altogether. However, not all wants or desires are self induced - some may be given to us and some may be necessary for survival. As a want becomes established, we begin to formulate strategies in which we may obtain it. Whether or not our strategies follow sound logic is imperative to our subsequent emotion, that is, the acquisition of this desire. Faulty or poor logic may lead to failure to achieve or acquire our present desire. These failures, or successes, determine our emotional state and, with humans possessing so many wants and desires, it's no wonder our emotions sometimes seem to be so chaotic.

Continuing to expand on this proposition, I suspect that when we arrive at a negative emotion, it is because we are acting on flawed logical premises or conceptions. The opposite would apply when obtaining a positive emotion. It follows that, those who are often angry or sad, negative emotions, are often formulating strategies with either a fallacious premise or an inconceivable plan. Again, the opposite would apply to those who are often joyful or happy. These failures may in fact be caused by a lack of knowledge, intelligence, or ability but, an individuals most frequent emotional state (with the exception of mental/psychological disorders) is certainly the fault of the individual.

This does not mean negative emotions can be avoided altogether since the universe, at the moment, is rather unpredictable. What this does mean is that people have a great deal of control over their state of being. A healthy amount of introspection, might be necessary in order to conceptualize a path to achieve the desired outcome.

I could probably continue on with examples and ideas that are supported by this concept but, I think any more expansion would warrant another post. Consider that with this idea as a base, you can in fact explain a lot of behaviors you might observe, or exhibit yourself. Once the knowledge is there, an opportunity for refinement and advancement becomes available, leaving the individual to take the necessary actions. On that note I'll take my leave, any questions, comments and/or suggestions are appreciated.

Wednesday, June 5, 2013

Big O Notation and Run Times

Recently, I have been learning about SQL indexing and the way databases store information. During this process, the run time for sorts, searches, etc. has come up. Because of that, I think it's worth trying to explain some of the run times (Big O Notation) for a few different searches or rather, what it means when we say $O(n)$.

Credit to this post on Reddit for giving me an idea of how to explain it but, I think I'll try to take a little different route. I didn't read the whole post actually, but I do plan on borrowing a few ideas.

First off, what is Big O Notation? Dealing with algorithms in computer science, I would define it as a function to represent the amount of time (worst case scenario) it will take a process (algorithm) to run on a given data structure. Why is this important? When building algorithms to handle data, it takes some amount of time to execute; depending on how it's built it may take longer than another algorithm that yields the same result. Also, it may be important for how well the algorithm scales depending on how much data you're working with.

To work with these ideas, we generally use Big O Notation (calculating the worst case scenario [run time] - sometimes the average case). One example of how this can be useful would be that an algorithm in practice returns its result immediately every time, if we assume that this scenario is the average run time for the algorithm, when we run our process on different or larger amounts of data we may find that it actually executes in quadratic time - a drastic decline in performance. Knowing that we have an algorithm that runs in worst case, $O(\log n)$, we know what to expect as the data changes.

In this post, I plan to go over a few of the common time complexities I run into often. These are constant, logarithmic, linear, linearithmic, and quadratic run times.

For these explanations, I'd like to set up a scenario:
You have a bunch of boxes and, depending on the algorithm, you've misplaced an item in these boxes. Other than the missing item, these boxes may have other items or properties; the other information about the boxes will be used to further explain my example.

Constant:
Constant run time is the easiest to understand (I'd think) because, it's just that - constant. It's written as $O(1)$. Based on my scenario, and it [the box scenario] isn't really needed for this run time, we can picture a room of 10 boxes. In a search, we could say every box has the item I'm looking for, so every time we look, we find the item right away. It always takes the same amount of time for this search; hence, constant run time. In a insertion or deletion, it could be looked at as we always remove a box from the same position, or we always add a new box to the room (at random even) the same way; the process always takes a constant amount of time.

Logarithmic:
Logarithmic run time is a little more tricky but, not too bad. It's written as $O(\log n)$ Let's say that we have 10 boxes organized by weight and we're looking for a box of a specific weight. Given that we don't know the weights of the boxes only the initial weight and these boxes are ordered from least weight to most weight, left to right - a logarithmic search will be sufficient. Since, there are only 10 boxes, starting at the left most and weighing each one might not be so bad; however, what if there were 1000 boxes? If it took 15 seconds to weigh each box and if the box we were looking for was the right most box, it would take 15000 seconds (or over 4 hours), worst case, to find the box we want. Maybe instead, we could start in the middle and work our way out in some direction from there? That would give us $n/2$ or with 1000 boxes, 500 checks worst case (assuming we were lucky enough to check the correct half); a little over 2 hours but, still not good. Going back to 10 boxes, what if we applied the second method, starting at the middle box, over and over again? Let's say the box we want is in the third box. Starting at box five, if it weighed too much, we would then look at the 4 boxes to the left and try the middle again - box two. If that was too light, we can check the 2 boxes to the right - box three and four. No matter which one we pick, we'll have found our box in 3 or 4 checks, or $\log_2 10$. This is how a logarithmic run time (or search in this case) works. Scaling up to 1000, instead of taking 15000 seconds, we take $log_2 1000$ seconds, or about 150 seconds, just over 2 min. This is a significant improvement and shows why a logarithmic run time may be considered ideal in many cases.

NOTE: In this case, because we split the groups in half, we dealt with log base 2; this may not always be the case depending on what data structure we're working with - it's logarithmic time, not logarithmic base 2 time after all.

Linear:
Linear, like constant, run time is pretty straight forward (haha...). It's written as $O(n)$. Let's take the same scenario as we did with logarithmic time with one exception; the boxes are not sorted. Since they are not sorted, we can't apply our method of dividing them in half, as we'd be in the same situation as we were with one group - still needing to check each one. That means in the case of 1000 boxes, we may have to search all 1000 depending on where we start. After looking at logarithmic time, this seems like a terrible way to search for things but, these next two scenarios can be much worse (as well as many other run times that I won't be talking about in this post).

Linearithmic:
Linearithmic, I think, is kinda difficult to give an example for; hopefully what I came up with makes sense. It's written as $O(n \log n)$. It makes a little more sense to me to describe a linearithmic run time as a way of sorting objectes rather than searching them - maybe because I can't think of a good searching example. Let's take our linear scenarios and set it up so that we can search for our item in logarithmic time, that is, we have 10 unsorted boxes and we want to order them by weight. We can start by grouping the boxes into two's and sort those based on the left most box (so the lightest box is on the left). From here, we can group the boxes again, giving us three groups of sorted boxes - two of 4 and one of 2. Again, these were sorted by comparing the left most box of one group (of the groups being combined) to the left most box of a second group and putting the lower of the two in a new group until the groups become one new sorted group. We can do this again and we get a group of 6, giving us two groups now - one of 4 and one of 6. And finally, one more combination to get a single sorted group. This is also known as a merge sort, I will attempt to demonstrate below. This is linearithmic ($O(n \log n)$) because we may have had to compare each box (worst case) $n \log n$ times. That is, it could have taken up to 34 comparisons ($10 * \log_2 10$ - again, base 2 because we're grouping 2 at a time) to sort the boxes. This is relatively efficient, considering other options may have taken much longer (quadratic time explained below).

Demonstration (the numbers are the weights of each box and do not have to be unique):
Starting Point: 3 1 9 4 2 5 10 8 7 6
First Grouping: 1,3 4,9 2,5 8,10 6,7 - 4 box movements, 5 checks (comparisons)
Second Grouping: 1,3,4,9 2,5,8,10 6,7 - 0 box movements, 5 checks
Third Grouping: 1,3,4,9 2,5,6,7,8,10 - 2 box movements, 4 checks
Fourth and Final Grouping: 1,2,3,4,5,6,7,8,9,10 - 5 box movements 9 checks
NOTE: The boxes are now sorted and while in this example it may look like we took 40 comparisons, we actually only took 23. Remember, $n \log n$ is the worst case scenario.

Quadratic:
Quadratic run time isn't too difficult to understand but, it may not always be that easy to recognize. It's written as $O(n^2)$. Let's say we have 10 unsorted boxes, and in each box there are 10 more unsorted boxes, one of which has the item we seek. Since these boxes are unsorted, we can't divide them up so, we must look in each box and in each of it's inner boxes until we find our item. In other words, if the item is in the very last inner box we check of the last outer box we check, that means we have checked 10 inner boxes for each of the 10 boxes we have or $10^2 (100)$ checks. This is horribly inefficient and it would actually be faster to sort the boxes first and then search but, there are some scenarios where you are stuck with quadratic run times.

Those are the most common run times I've run into and a bit of an explanation on what they actually mean. I should mention that, there are cases where you might end up with an algorithm that may have a run time of $n + \log n$. We should look at this as though it was a limit going to infinity, since both functions will end up at infinity we don't need to represent it as infinity + infinity, since that is still infinity. In that case we take the slower of the two since, the addition of $\log n$ only slightly effects the graph - this would be a big O of $O(n)$. There are many other run times that show up in algorithms and it's important to understand what they mean, especially if you're designing algorithms to scale. This is all I'll cover for now on this topic. Any questions, comments and/or suggestions are appreciated.

Thursday, April 18, 2013

Truth and Trust.

For a moment, I'm going to move away from the usual posts about programming or math.

Recently, I came across the thought of how important trust is. Most people I've encountered tend to place it at the top of their list of people requirements. That is, generally, above all else, people want their friend, co-worker, family member, or significant other to be a "trustworthy person".

For the most part, I agree with this sentiment. Like many people, I might try to avoid those who are dishonest more often than not; however, I feel like this isn't the most important feature of the person being evaluated. Rather, I think there is a category of trust that is often overlooked - reliability. It may be essential to quickly describe what truth is before we can say someone is truthful and therefore, trustworthy.

The first step is to break truth into categories. By categorizing truth, what we are really doing is defining it. This, while sounding obvious, is where I think most people tend to make a mistake in their assessment. We must avoid mixing the subsets; simply, we cannot make the mistake of assuming that one subset of truth has an intersection with another subset.

We can start by splitting truth up into two categories:
1. Subjective truth - truth that pertains to an individual, group, or species.
2. Objective truth - truth that revolves around existence; a verified or indisputable fact.
From here, I want to split subjective truth into two categories:
1a. Established truth - truth that is based on set events particular to the subject
1b. Dependent truth - truth that is dependent on perceptions or future events particular to the subject

Let's continue with some examples. Using the labels above, I will give an example of each truth using "Person A" and "Person B":
1a - Person A asks Person B what time it is. Person B replies with the correct time. (relevant to location - timezone)
1b - Person A asks for an explanation of Plato's Theory of Forms. Person B explains the theory to Person A (dependent on Person B's understanding)
2 - Person A asks for the formula to calculate force. Person B replies with correct formula (F = ma). (the answer should always be the same)

Continuing with my opening statements - sure, we might be better off avoiding someone who will lie to you about the type 2 truths. But, in reality, you may not find too many people who lie about these truths to begin with. In most cases, you don't necessarily even need to be friends with someone to extract this type of information. When evaluating, I think people tend to look at are the type 1 truths, specifically type 1a. While, both of these truths are important, I would argue that type 1b truths are the most important. These are the truths that cause you to rely on a person, to sincerely consider them trustworthy.

When dealing with type 1b truths, you are depending on an individual to give you information that's validity is entirely in the hands of said individual. The risk here is that the information may be considered truth initially yet, fall short on its dependencies over time. In these cases, it's sometimes difficult to fault the person who failed as the falsehood may not have been intentional. In my experience, many people don't worry about this type of truth when making friends; often, one might simply label someone as unreliable or indecisive and let it go. I find it strange, this is a highly sought after quality in business yet, among friends it's negligible? I think this idea is definitely worth some heavy contemplation as you analyze your relationships.

From here, I think I could manage two more blog posts on type 1b alone; manipulation of the truth, recognizing it, labeling a person trustworthy, etc. However, this might be a good stopping point to avoid endless tangents. The other types have heavy implications as well and each could manage a post or two also. I think it's worth noting that it's often the case that if you have trouble trusting people in this way (type 1b), it might be because you yourself are not trustworthy in this way. That being said, it would be a mistake to fault yourself because you are disappointed often. Understanding this concept, a rather straightforward one at that, can make a significant difference in your decisions and success so, it might be worth a little reflection. I'll end this post on that note and, as usual, any questions, comments and/or suggestions are appreciated.

Wednesday, April 3, 2013

Exception Handling and Future Posts

I'm not sure how, but I managed to forget that I had started this blog. It's possible that this is because LaTeX has not been working correctly on my preview - it's quite annoying to not be able to double check your work. Anyways, I thought, for the sake of keeping this blog alive, I'd post where I plan to go from here and a bit about a mishap at work.

*NOTE* Actually, as I was typing this, I realized that none of my JavaScript is working in preview. Not sure if this is my fault or blogger is just not working correctly...

For now, those posts on calculus are being put on hold indefinitely. I did review some of the material; however, it's a lot to type up and I'm not sure if that's the direction I want to continue going with this blog. I'll try to post anywhere from one to two times a month and, for now, the topics will be random (not in series like the calculus posts attempted to be).

That being said, the I'll be sure to try to be more active from now on.

Moving on to my SQL troubles.

Now, I know better than what I did here but, I guess we all make mistakes...I guess.

At work, I primarily work in Oracle. When I'm not working in Java or some scripting language, it's stored procedures and triggers till I can no longer type. Recently, I had to make a simple update - to catch a certain value as it passed through our system and update some logs and data based on the value. For those who are not familiar, there is an Oracle function called, "NVL". Usage is as follows, "nvl([valueToCheck], [valueToReplaceWith])", this means that when this function is called, if the 'valueToCheck' is null, then return the 'valueToReplaceWith'. For example:

set serveroutput on size unlimited;
declare
  some_string varchar2(64);
  test varchar2(64);
begin
  some_string := null; -- null value for illustrative purposes
  select nvl(some_string, 'better value') into test from dual;
  dbms_output.put_line(test);
end;
/
show err

This little PL/SQL block should output "better value" because 'some_string' is null.

Moving on, I built a new procedure and came up with the flawed reasoning that if the table doesn't have a row for a value it will return a null value on a select query. With this reasoning, I figured, "exceptions will just clutter the code, I'll just use "NVL" and check the value one time," which lead me to code like this:

-- ...other parts of the procedure
-- bad idea follows:
  select nvl(value_i_need, '?') into value_to_check 
  from some_table 
  where condition = 'condition_to_be_met';
  if value_to_check != '?' then
     -- do stuff
  else
     -- do other stuff
  end if;
-- ...more procedure logic...but logic that worked... 
end;
/
show err

Looking back, I realize that there are other flaws with this but, I'm just gonna call it a bad day. I proceded to push out this code to our working set and, only hours later, everything is broken. It was at this point I realized that my "select into" was causing a no data found exception because no rows existed for some of the conditions. Again, looking back there were plenty of better ways to handle this but, more or less, the moral of this story is, even if you don't think you need it, you always need exception handling. Just doing this alone, would have saved the errors that followed:

-- ...other parts of the procedure
-- begin end for exceptions...
  begin
    select nvl(value_i_need, '?') into value_to_check 
    from some_table 
    where condition = 'condition_to_be_met';
    if value_to_check != '?' then
       -- do stuff
    else
       -- do other stuff
    end if;
  exception when no_data_found then
    value_to_check := '?';
    -- problems solved...
  end;
-- ...more procedure logic... 
end;
/
show err

I thought this was worth writing about, if for nothing else then the fact that it will help me remember to not make that mistake again. Oddly enough, I never forget exceptions in Java, possibly because I'm more comfortable coding in it. Anyways, maybe sometime I'll look into the time complexities of SQL fetch queries depending on joins and things like that; with all the time I spend in Oracle, I've been curious lately.

Any questions, comments and/or suggestions are appreciated.