Here is BucketSort again:

def bucket_sort(L):
    max_int = max(L)
    
    counts = [0]*(1+max_int)
    #counts[i] will be the number of times that
    #i appears in L
    
    for e in L:
        counts[e] += 1
        
    
    #Now, build up the sorted version of the list. Note that if we have
    #counts[i] of the number i, we need to extend the sorted list
    #by [i]*counts[i], which is just [i, i, i, ..., i]
    #                                  counts[i] times
    sorted_L = []
    for i in range(0, len(counts)):
        sorted_L.extend([i]*counts[i])
    
    #Modify the contents of L to be the contents of sorted_L
    L[:] = sorted_L

Let's analyze the how much time each line takes:

def bucket_sort(L):
    
    #max(L) needs to go through every element of L. This requires a*len(L) time for some constnat a
    max_int = max(L)
    
    #Creating and putting in memory a list of length max_int+1 takes time proportional to max_int (approximately) 
    #This line takes b*max(L) time
    counts = [0]*(1+max_int)
    
    
    #We are going through the list L, repeating a block that takes the same amount of time at every iteration.
    #This takes c*len(L) time.
    for e in L:
        counts[e] += 1
        
    #This is tricky: different interations take different amounts of time here. We'll use a trick: we'll
    #count the total number of elements that are appended to sorted_L overall. 
    #
    #In total, we are appending len(L) elements to  sorted_L. That takes d*len(L) time. 
    #
    #The for-loop itself performs len(counts) iterations, which takes approximately e*max(L) time
    #(Since len(counts) is close to max(L))
    sorted_L = []
    for i in range(0, len(counts)):
        sorted_L.extend([i]*counts[i])
    
    #This line takes f*len(L) time
    L[:] = sorted_L

In total, summing up all the contributions from the different parts of the function, bucket_sort(L) runs in

$(a+c+d+f)len(L) + (b+e)max(L)$ time.

Now, $(a+c+d+f)len(L) + (b+e)max(L) \leq (a+b+c+d+e+f)len(L) + (a+b+c+d+e+f)max(L) = (a+b+c+d+e+f)(len(L)+max(L)$.

This implies that bucket_sort() runs in $\mathcal{O}(len(L)+max(L))$ time.

When is Bucket Sort appropriate¶

This algorithm relied crucially on our being able to arrange "buckets" in such a way that we could put the elements of the list in them. Note that this would work much worse with, for example, strings. We could try to sort a list of names like that, but that would require setting up a lot of buckets, one for every possible name in the world.

For sorting strings, we use something similar to what we have above, but the buckets are for the first initials, not for the entire strings. (That means that we'd have to process the buckets further than we did here.) (We didn't go further than that in lecture.

Complexity of Bucket Sort¶

There are several scenarios under which $max(L)$ is fixed:

The elements of L are known to be in some fixed range, for example 0..100
The elements of L are of a type that can only store numbers up to a certain magnitude. (The way floats in Python are, for example; although we couldn't sort floats using Counting Sort.)

If, for example max(L) = 100, then $\mathcal{O}(max(L)+len(L) = \mathcal{O}(100+len(L)) = \mathcal{O}(len(L))$.