sum_list2 again

Here is sum_list2 again:

In [1]:
def sum_list2(L):
    '''Return the sum of the list of ints L'''
    if len(L) == 0:
        return 0
    
    if len(L) == 1:
        return L[0]  #the sum of the list is L[0] if L[0] is the only element
    
    mid = len(L)//2 #(the index of the approximate midpoint of the list)
    return sum_list2(L[:mid]) + sum_list2(L[mid:])

In order to figre out the complexity of sum_list2, we will draw a call tree. We'll use the notation [k] to denote a list of length k. For convenience, we'll assume that len(L) is a power of 2. That's a common assumption that's usually harmless (in the sense that it doesn't change the complexity estimate) -- we will not pursue the detailed analysis that proves that in this case: it's slightly tedious but not very difficult.

           [1].......[1]...[1]....................[1]              
            .                                   .
             .                                 .
              .                               .
               .                             .
                 .                          . 
                 [n/4]    [n/4]  [n/4]  [n/4]
                   \      /       \     /
                    \    /         \   /
                     [n/2]        [n/2]
                       \          /
                        \        /
                         \      /
                          \    /
                            [n]


Each call produces two more calls (for example, for a list of length n/4, we need to compute the sum of two lists, both of length n/8. Eventually, we reach calls to sum_list2() for lists of length 1, at which point we return.

IF WE ASSUME THAT SLICING COSTS NOTHING (which is not true), each call takes the same amount of time , so we just need to count the calls. We'll do that by counting the number of calls on each level:

                                                                      Number of calls
           [1].......[1]...[1]....................[1]                      2^k
            .                                   .
             .                                 .
              .                               .
               .                             .
                 .                          . 
                 [n/4]    [n/4]  [n/4]  [n/4]                              2^2 
                   \      /       \     /
                    \    /         \   /
                     [n/2]        [n/2]                                    2^1
                       \          /
                        \        /
                         \      /
                          \    /
                            [n]                                            2^0


Each level has twice as many calls as the previous level. We start with $1=2^0$, and end at $2^k$, where $k$ is the number of levels minus 1 (since we start at $2^0$.)

The sum of the geometric progression is: $$\sum_{i=0}^{k}\alpha^i = \frac{1-\alpha^{k+1}}{1-\alpha}$$

In total, theorefore, we'll have $$\sum_{i=0}^k 2^i = \frac{1-2^{k+1}}{1-2} = 2^{k+1}-1$$

calls.

What's k? It's the number of times we need to divide n by 2 in order to get to 1. In other words, by definition, it is $\log_2 n$.

(Another way to arrive at the same conclusion is to observe that we'll make n calls to sum_list2() with lists of size 1 -- one per element of the original list L. Since we said that the last level contains $2^k$ calls, it follows that $n = 2^k$, so that, again, $k = \log_2 n$, which we get by taking the $\log_2$ of both sides.)

The total number of calls is therefore $$2^{k+1}-1 = 2^k \times 2^1 -1 = 2\times 2^k - 1 = 2\times 2^{\log_2 n} - 1 = 2n-1$$

(Note that $2^{\log_2 n} = n$, since $\log_2 n$ is be definition the power to which $2$ is raised to get $n$.)

So the complexity of sum_list2 is $\mathcal{O}(n)$, assuming slicing is costless.

Slicing isn't costless

What's the true story there? Actually, slicing creates a new list, costing us time that's proportional to the length of the resulting new list.

See the analysis of MergeSort to see that the runtime here is actually $\mathcal{O}(n\log n)$.

In this case, we assumed that slicing is costless because we could actually avoid it:

In [2]:
def sum_list2_fast(L, start, end):
    if end == start:
        return 0
    if end == start + 1:
        return L[start]
    
    mid = (start+end)//2
    
    return sum_list2_fast(L, start, mid) + sum_list2_fast(L, mid, end)

Here, the complexity really is $\mathcal{O}(n)$.