Big O Notation
Big O Notation
Big O Notation
With big O notation we express the runtime in terms of—brace yourself—how quickly it
grows relative to the input, as the input gets arbitrarily large.
1. how quickly the runtime grows—Some external factors affect the time it takes
for a function to run: the speed of the processor, what else the computer is
running, etc. So it's hard to make strong statements about the exact runtime of an
algorithm. Instead we use big O notation to express how quickly its runtime
grows.
2. relative to the input—Since we're not looking at an exact number, we need to
phrase our runtime growth in terms of something. We use the size of the input. So
we can say things like the runtime grows "on the order of the size of the input"
(O(n)O(n)) or "on the order of the square of the size of the input" (O(n^2)O(n2)).
3. as the input gets arbitrarily large—Our algorithm may have steps that seem
expensive when nn is small but are eclipsed eventually by other steps as nn gets
huge. For big O analysis, we care most about the stuff that grows fastest as the
input grows, because everything else is quickly eclipsed as nn gets very large. If
you know what an asymptote is, you might see why "big O analysis" is sometimes
called "asymptotic analysis."
Big O notation is like math except it's an awesome, not-boring kind of math where you
get to wave your hands through the details and just focus on what's basically happening.
If this seems abstract so far, that's because it is. Let's look at some examples.
Some examples
def print_first_item(list_of_items):
print list_of_items[0]
Python
This function runs in O(1)O(1) time (or "constant time") relative to its input. The
input listcould be 1 item or 1,000 items, but this function would still just require one
"step."
def print_all_items(list_of_items):
print item
Python
This function runs in O(n)O(n) time (or "linear time"), where nn is the number of
items in thelist. If the list has 10 items, we have to print 10 times. If it has 1,000 items,
we have to print 1,000 times.
def print_all_possible_ordered_pairs(list_of_items):
Python
Here we're nesting two loops. If our list has nn items, our outer loop runs nn times and
our inner loop runs nn times for each iteration of the outer loop, giving us n^2n2 total
prints. Thus this function runs in O(n^2)O(n2) time (or "quadratic time"). If the list has
10 items, we have to print 100 times. If it has 1,000 items, we have to print 1,000,000
times.
def say_hi_n_times(n):
for time in range(n):
print "hi"
def print_all_items_in_list(the_list):
print item
Python
def print_all_items_twice(the_list):
print item
print item
Python
def print_first_item_then_first_half_then_say_hi_100_times(the_list):
print the_list[0]
middle_index = len(the_list) / 2
index = 0
print the_list[index]
index += 1
print "hi"
Python
Why can we get away with this? Remember, for big O notation we're looking at what
happens as nngets arbitrarily large. As nn gets really big, adding 100 or dividing by 2
has a decreasingly significant effect.
def print_all_numbers_then_all_pair_sums(list_of_numbers):
print number
Here our runtime is O(n + n^2)O(n+n2), which we just call O(n^2)O(n2). Even if it
was O(n^2/2 + 100n)O(n2/2+100n), it would still be O(n^2)O(n2).
Similarly:
Again, we can get away with this because the less significant terms quickly become, well,
less significant as nn gets big.
Sometimes the worst case runtime is significantly worse than the best case runtime:
if item == needle:
return True
return False
Python
Here we might have 100 items in our haystack, but the first item might be the needle, in
which case we would return in just 1 iteration of our loop.
In general we'd say this is O(n)O(n) runtime and the "worst case" part would be implied.
But to be more specific we could say this is worst case O(n)O(n) and best
case O(1)O(1) runtime. For some algorithms we can also make rigorous statements about
the "average case" runtime.
Space complexity: the final frontier
Sometimes we want to optimize for using less memory instead of (or in addition to)
using less time. Talking about memory cost (or "space complexity") is very similar to
talking about time cost. We simply look at the total size (relative to the size of the input)
of any new variables we're allocating.
This function takes O(1)O(1) space (we aren't allocating any new variables):
def say_hi_n_times(n):
print "hi"
Python
This function takes O(n)O(n) space (the size of hi_list scales with the size of the input):
def list_of_hi_n_times(n):
hi_list = []
hi_list.append("hi")
return hi_list
Python
Usually when we talk about space complexity, we're talking about additional
space, so we don't include space taken up by the inputs. For example, this function takes
constant space even though the input has nn items:
def get_largest_item(list_of_items):
largest = float('-inf')
largest = item
return largest
Python
Sometimes there's a tradeoff between saving time and saving space, so you have to
decide which one you're optimizing for.
Big O ignores constants, but sometimes the constants matter. If we have a script that
takes 5 hours to run, an optimization that divides the runtime by 5 might not affect big
O, but it still saves you 4 hours of waiting.
But that doesn't mean startups don't care about big O analysis. A great engineer (startup
or otherwise) knows how to strike the right balance between runtime, space,
implementation time, maintainability, and readability.
You should develop the skill to see time and space optimizations, as well as
the wisdom to judge if those optimizations are worthwhile.