How to

When solving these problems, learning by doing is much better than learning by reading. I encourage to you read only as far in the solution as you need, then trying to solve the problem. If you get stuck, try reading a little further. And of course, let me know if you find a better solution!

Monday, March 12, 2012

An array, a binary search tree and a guy walk into a bar...

Problem:  Write a function that determines if an element is in a binary search tree and an unordered array. 

bool inBoth(int elem, int a[], struct node * root) {

Solution:  This seems really simple at first.  We can write a lookup for the array and a lookup for the binary search tree and call both.  Let's do this to show that we can get a functional answer.  

As algorithms, we can very simply iterate through the array, and follow the binary search tree to get the answer.

/* Implement standard lookup in an unordered array */
bool arrayLookup(int elem, int a[]){

    int i;
    for (i=0; i < a.length; i++) {
        if (elem == a[i]) return (true);
    return (false);

/* Implement standard lookup in a binary search tree */
bool bstLookup(int elem, struct node * root) {
    /* Recursive base case of reaching null being false */
    if (root == null) return (false);

    /* True if our current node has the correct element */
    if (root->value == elem) return (true);

    /* Otherwise, follow the correct branch of the current node*/
    if (root->value < elem) 
        return (bstLookup(elem, root->left));
        return (bstLookup(elem, root->right));

/* Use the two functions to implement inBoth() */

bool inBoth(int elem, int a[], struct node * root) {

    return (arrayLookup(elem, a) && bstLookup(elem, root));

What is the algorithmic run time of this function?  The Array lookup checks every element, so it is O(n) and the bst lookup checks only one branch, which is O(log n).  So, overall, this function is O(n).

Is this good?  

It could be, but are there any very quick optimizations to make?  

Look at the return line in inBoth.

What happens first and and what happens second?  Well, arrayLookup is called, and then bstLookup is called.  Since we know that arrayLookup is faster, one very simple optimiation is to change the order of the functions in that line so we do the bstLookup first.  This could be a huge optimization if we rarely hit in either the bst or the array, so we do not have to frequently iterate through the entire array.

Also, if this function were rarely called and the binary search tree and the array were changing rapidly, it could make sense to implement the function like this.

But, let's now assume that the array does not change very much.  How could the function become more efficient?

One likely solution is to sort the array.  This could be done in O(n log n) and then any lookup could be in O(log n) time.  So, by taking the overhead of one sort operation, we can then make lookup an O(log n) instead of O(n) function.

But, this comes with tradeoffs of: 1) sorting the array, and 2) needing to maintain the sorted array.  

Another option is to use a different data structure.  Lookup can be very fast with a hash table.  So, one option is to create a hashtable, O(n) operation, which then looks up as O(1).  This has the overhead of the memory and the maintenance of the hashtable, but this can reduce the worst case run-time of the function from O(n) to O(log n) -- with the constraint being the binary search tree.

It is also an option now to but the bst in a hash table, but this could be done easier -- since we already have a hash table.  We can go through the bst and indicate on each element whether it already exists (and return true) or whether it does not already exist (and false).

This is also an O(n) operation.  Once we create this lookup+boolean hash table, we can then in constant time do a lookup, and it only takes O(n) to build this structure, though it does take memory and overhead to maintain.  Depending on likely uses, this could be a good choice.

This is an example of where in an interview it is important to first implement a clean solution, then look at easy implementations (like changing the order of the function calls) and then talk through even more optimizations and cases where they would make sense/not make sense.

/* Please let me know if you find any bugs or alternate solutions.  Thank you, Noah */

No comments:

Post a Comment