Let's come back to the example from before:

def find_i(L, e):
    for i in range(len(L)):  
        if L[i] == e:        
            return i         
        
    return None

Ideally, we'd want to know how fast this function runs for any list. But that requires knowing exactly what computer it runs on.

Let's instead count the number of operations this function needs to perform in order to run. If every operation takes, for example, 0.01ms, then we can compute the total time by multiplying the number of operations needed by 0.01.

Let's try to count the operations that each line performs

def find_i(L, e):
    for i in range(len(L)):  #1 operation: load a new number into i
        if L[i] == e:        #2 operations: access the i-th element of L, and compare it to 2
            return i         #1 operation
        
    return None              #1 operation

Those counts are necessarily approximate. For example, it takes more than one operation behind the scenes to load a new number into i and also decide whether that was the last number. But that doesn't matter much, as we'll soon see.

So what's the number of operations required? We have two cases:

Case 1: `L[k] == e` for some `k`¶

In this case, the loop runs k times. We repeat the following two lines k times: for i in range(len(L)): #1 operation: load a new number into i if L[i] == e: #2 operations: access the i-th element of L, and compare it to 2

That takes $3k$ operations. Then we also return (once), taking, in our accounting, $1$ more operation.

In total, we would perform $3k+1$ operations.

Case 2: `e` is not in `L`¶

In this case, the loop runs len(L) times, and returns once. Setting n=len(L), we would perform $3n+1$ operations.

Worst-case runtime complexity for input of size n¶

A lot of the time, we want a more concise answer. We'd like to know, for input of size n, the longest amount of time the function could possibly take. (We have to specify the size of the input -- otherwise the function could take arbitrarily long amounts of time.)

In the worst case, e is not in L, so the function will run for $3n+1$ operations.

Intro to Big-Oh analysis¶

What we're really interested is how much time the function will take. Let's say that on our computer, each elementary operation takes 0.01ms. (Actually, it's on the order of $5e-5$ ms on ECF -- we'll see that tomorrow.)

That means that, in the worst case (i.e., when e is not in L), the function should take $0.01\times (3n+1)$ ms to run, where n == len(L).

But the $0.01$ coefficient is just made up! What we really want ot know is what the runtime of the algorithm will be proportional to. Essentially, we want to say that the runtime, in this case, will be roughly proportional to $n$ .

We write it as $0.01\times (3n+1)$ is $\mathcal{O}(n)$ .

Informally, a function $f(n)$ is $\mathcal{O}(g(n))$ if $f(n)$ grows no faster than $g(n)$ , for large n. In other words, the ratio $f(n)/g(n)$ doesn't tend to infinity.

For example, if $f(n) = 3n+1$ and $g(n) = n$ , then $f(n)$ is $\mathcal{O}(g(n))$ . We also write it as:

$3n+1$ is $\mathcal{O}(n)$ .

It's also the case, for example, that:

$10000 n+20$ is $\mathcal{O}(n)$ .

$10000 n^2+20000000000000n-50$ is $\mathcal{O}(n^2)$ .

$10000 n^2+20000000000000n-50$ is $\mathcal{O}(n^3)$ . (Because $n^2$ grows slower than $n^3$ )

$10000 n^2+20000000000000n-50$ is $\mathcal{O}(n^2 log{n})$

But:

$10000 n^2+20000000000000n-50$ is not $\mathcal{O}(n^2/log{n})$

$10000 n^2+20000000000000n-50$ is not $\mathcal{O}(\sqrt{n})$

The technical definition of Big-Oh (optional)¶

As we said, basically, $f(n)$ is $\mathcal{O}(g(n))$ if the ratio $f(n)/g(n)$ doesn't tend to infinity. We can write this technically as follows:

\underset{n \to \infty}{lim sup} f (n) / g (n) < \infty

$\limsup_{n\rightarrow \infty} f(n)/g(n) < \infty$

You can think of $\limsup$ almost as a $\lim$ . The reason $\limsup$ is used that it's defined in more situations than $\lim$ is. Read more e.g. here: https://en.wikipedia.org/wiki/Limit_superior_and_limit_inferior

We don't really care about the technical definition here -- the imporant thing to understand is that when we say $f(n)$ is $\mathcal{O}(g(n))$ we mean that for large enough $n$ , $f(n)$ is either almost proportional to $g(n)$ , or $f(n)$ is much smaller than $g(n)$ .

Counting iterations¶

In most cases, we can identify a loop such that the number of iterations of that loop is proportional to the total number of operations. This is the case here. In the worst case, the loop in find_i() runs $n$ times, and we can say that the worst-case runtime complexity of find_i is $\mathcal{O}(n)$ where n = len(L).

Tight upper bound on the worst-case runtime complexity¶

It is also technically true to say that the runtime complexity of find_i() is $\mathcal{O}(n^2)$ . However, that is not a tight upper bound on the worst-case runtime complexity. By the tight bound, we mean the slowest growing possible function $g$ such that the runtime is $\mathcal{O}(g)$ .

Case 1: L[k] == e for some k¶