How to

When solving these problems, learning by doing is much better than learning by reading. I encourage to you read only as far in the solution as you need, then trying to solve the problem. If you get stuck, try reading a little further. And of course, let me know if you find a better solution!

Monday, April 23, 2012

Biggest and There (part II)

Problem:  Create a data structure and functions that can insert into the structure and then output the largest element and whether an element is a member

bool insert (int elem, struct node * dataStruct) {...}

int largest (struct node * dataStruct) {...}

bool isMember(int elem, struct node * dataStruct) {...}

Solution:  At its core, this is a fairly straight-forward problem.  We need to do three common operations in data structure:  insert, determine membership and determine the largest value.  

First, we need to figure out what data structure to use.  Let's go through a few common ones and talk about the trade-offs, efficiency and maintenance required for each data structure.

One common data structure we could use is an array.  This has the advantage of being fairly simple.  We could just add elements to the end of the array.  This would be an O(1) step.  Then, we could lookup any element by iterating through the array O(n) and could determine the max value in O(1) time.  Also, there is virtually no data structure overhead in this case.

Well, this is a possible solution.  If we will do lots of inserts and very few lookups and largest calls -- and we just want something simple; maybe this could work.

But, there are a few possible optimizations.  First, we could keep the list ordered.  This has the advantage of making lookup an O(log n) operation, which is much faster, and making the largest function an O(1) time operation as we simply return the last element.  The downside here is that insert takes much longer.  We may frequently have to shift values in the array down to make room, so insert could be as long as an O(n) function, which seems pretty long to build up the data structure.

So, let's look another structure.  How about a binary tree?  Well, a tree has insertion in O(log n), but would occasionally have to be rebalanced, usually an O(n) operation.  Lookup is O(log n), as is finding the max element (follow the right branch to the bottom).  Also, there is the overhead of the pointers to maintain the tree structure.  So, this could be a good solution if we were doing a good amount of lookups, insertions and finding the max value repeatedly.

Let's think about the ideal characteristics of this data structure.  It would have constant time insert and constant time lookup and constant time max value.

Well, a hashtable could do the first two.  We can create a hashtable that would allow us to insert and lookup in constant time, but getting the max value would normally be O(n).

But, does it have to be?  What if simply modified the data structure to be a hashtable as well as a value representing the max value. During the insertion step, we could do a comparison to always track the max value. This would then make returning the max value an O(1) step (in theory -- we could to this addition to any data structure, but a hashtable is still fastest for lookup and insertion).  There is some hashtable overhead, but this seems manageable given the efficiency of this structure.

So, the ideal data structure is a modified hashtable which tracks its max value during the insertion step.  This makes all of the functions we need to implement constant time functions.

Now to implement.  Let's start with insert:

bool insert (int elem, struct node * dataStruct) {
    /* Check to see if it's the biggest or the first*/
    if (dataStruct->max < elem || dataStruct->Max == UNINITIALIZED) 
        dataStruct->max = elem;

    /* Insert into the hashtable */
    return( hashInsert(dataStruct->hashtable, elem));

int largest (struct node * dataStruct) {
    /* check to make sure there is a value */
    if (dataStruct->max != UNINITIALIZED) 
        return dataStruct->max;
        return ERROR;

bool isMember(int elem, struct node * dataStruct) {
    return (hashMember(dataStruct->hashtable, elem));

This is a pretty clean implementation, and assuming that you have hashtable functions, quite easy to implement.  Now, let's push the solution, as interviewers like.  What are some downsides of this?

Well, first, you need the hashtable functions!  Also, we didn't implement a delete function, and we should at least consider how to implement one.  It's a fairly straight forward hashtable delete, except when we're deleting the max element, which then requires a linear search to determine the max element.  So, if there were lots of deleting of the max element, this may not be a good choice.

/* Please let me know if there are bugs or alternate solutions */