Insertion Sort can be looked at as a small modification of Merge Sort. Instead of breaking the list up in two halves, we'll break it up into 1 and n-1 elements:

def insertion_sort(L):
    '''Return a sorted version of lis
    
    '''
    #base case
    if len(L) <= 1:  
        return L[:] #return a copy of L
                      #need to return a copy in case the user wants to modify
                      #the original lis and also modify the copy; so they 
                      #need to be kept separated
    
    #Sort the first half, sort the second half, and then merge the two halves!
    mid = 1
    return merge(insertion_sort(lis[:mid]), insertion_sort(lis[mid:]))

Let's look at an example of a run of insertion_sort:

               [3]   -> [3]
                |
            [20, 3]  ->merge([20], [3]) -> [3, 20]
              |
         [4, 20, 3]  ->merge([4], [3, 20]) -> [3, 4, 20]
             |
      [2, 4, 20, 3]  ->merge([2], [3, 4, 20]) -> [2, 3, 4, 20]
           |
  [10, 2, 4, 20, 3]  ->merge([10], [2, 3, 4, 20]) 
                           ->[2, 3, 4, 10, 20]

The reason this sort is called insertion_sort is that the merge inserts L[0] into L[1:] so that the resultant list is sorted.

Complexity of Insertion Sort¶

Let's look at the call tree for Insertion Sort

                              n. of ops merge takes
          [(1)]              k
            |
          [(2)]              2
            |            
          .....
          .....
            |
         [(n-2)]            k(n-2)
            | 
        [ (n-1) ]           k(n-1)
            |
       [   (n)    ]          kn

(We are using the usual notation of [ (m) ] as a list of length m.)

Like in MergeSort, not every call takes the same amount of time/operations: the merge() takes $\mathcal{O}(\text{len(L)})$ operations like before. Again, let's assume that merge() takes $k\text{len(L)}$ operations.

In total, we'll have $kn+k(n-1)+k(n-2)+...+2k = k(n+(n-1)+...+2) = k((\sum_{i=1}^n i)-1) = kn(n+1)/2-1$, which is $\mathcal{O}(n^2)$. The small change we made caused the sorting algorithm to be $\mathcal{O}(n^2)$ rather than $\mathcal{O}(n\log(n))$.