Let's come back to the example from before:

In [1]:
def find_i(L, e):
    for i in range(len(L)):  
        if L[i] == e:        
            return i         
        
    return None         

Ideally, we'd want to know how fast this function runs for any list. But that requires knowing exactly what computer it runs on.

Let's instead count the number of operations this function needs to perform in order to run. If every operation takes, for example, 0.01ms, then we can compute the total time by multiplying the number of operations needed by 0.01.

Let's try to count the operations that each line performs

In [2]:
def find_i(L, e):
    for i in range(len(L)):  #1 operation: load a new number into i
        if L[i] == e:        #2 operations: access the i-th element of L, and compare it to 2
            return i         #1 operation
        
    return None              #1 operation

Those counts are necessarily approximate. For example, it takes more than one operation behind the scenes to load a new number into i and also decide whether that was the last number. But that doesn't matter much, as we'll soon see.

So what's the number of operations required? We have two cases:

Case 1: L[k] == e for some k

In this case, the loop runs k times. We repeat the following two lines k times: for i in range(len(L)): #1 operation: load a new number into i if L[i] == e: #2 operations: access the i-th element of L, and compare it to 2

That takes 3k operations. Then we also return (once), taking, in our accounting, 1 more operation.

In total, we would perform 3k+1 operations.

Case 2: e is not in L

In this case, the loop runs len(L) times, and returns once. Setting n=len(L), we would perform 3n+1 operations.

Worst-case runtime complexity for input of size n

A lot of the time, we want a more concise answer. We'd like to know, for input of size n, the longest amount of time the function could possibly take. (We have to specify the size of the input -- otherwise the function could take arbitrarily long amounts of time.)

In the worst case, e is not in L, so the function will run for 3n+1 operations.

Intro to Big-Oh analysis

What we're really interested is how much time the function will take. Let's say that on our computer, each elementary operation takes 0.01ms. (Actually, it's on the order of 5e5ms on ECF -- we'll see that tomorrow.)

That means that, in the worst case (i.e., when e is not in L), the function should take 0.01×(3n+1)ms to run, where n == len(L).

But the 0.01 coefficient is just made up! What we really want ot know is what the runtime of the algorithm will be proportional to. Essentially, we want to say that the runtime, in this case, will be roughly proportional to n.

We write it as 0.01×(3n+1) is O(n).

Informally, a function f(n) is O(g(n)) if f(n) grows no faster than g(n), for large n. In other words, the ratio f(n)/g(n) doesn't tend to infinity.

For example, if f(n)=3n+1 and g(n)=n, then f(n) is O(g(n)). We also write it as:

3n+1 is O(n).

It's also the case, for example, that:

10000n+20 is O(n).

10000n2+20000000000000n50 is O(n2).

10000n2+20000000000000n50 is O(n3). (Because n2 grows slower than n3)

10000n2+20000000000000n50 is O(n2logn)

But:

10000n2+20000000000000n50 is not O(n2/logn)

10000n2+20000000000000n50 is not O(n)

The technical definition of Big-Oh (optional)

As we said, basically, f(n) is O(g(n)) if the ratio f(n)/g(n) doesn't tend to infinity. We can write this technically as follows:

lim supnf(n)/g(n)<

You can think of lim sup almost as a lim. The reason lim sup is used that it's defined in more situations than lim is. Read more e.g. here: https://en.wikipedia.org/wiki/Limit_superior_and_limit_inferior

We don't really care about the technical definition here -- the imporant thing to understand is that when we say f(n) is O(g(n)) we mean that for large enough n, f(n) is either almost proportional to g(n), or f(n) is much smaller than g(n).

Counting iterations

In most cases, we can identify a loop such that the number of iterations of that loop is proportional to the total number of operations. This is the case here. In the worst case, the loop in find_i() runs n times, and we can say that the worst-case runtime complexity of find_i is O(n) where n = len(L).

Tight upper bound on the worst-case runtime complexity

It is also technically true to say that the runtime complexity of find_i() is O(n2). However, that is not a tight upper bound on the worst-case runtime complexity. By the tight bound, we mean the slowest growing possible function g such that the runtime is O(g).