A Simple Explanation of Big O notation.

Sharjeel Siddique
6 min readJan 20, 2020


You might not know this but computer science gives us a way to understand our world and what’s going on behind the scenes in much of our cases. You might have come across the frequent question when dealing with writing code. What is it’s time complexity? How can you represent it in terms of Big O notation?

I will attempt to try to explain like your five the purpose and significance of Big O notation.

When we try to understand Big O notation we came to realize it’s not restricted to the study of computer science but also something we do with people.

“To lower costs per unit of output, people usually increase the size of their operations,” wrote J.C Hosken in 1955, in the first scientific article published on sorting. This is what economies of scale look like to anyone coming from a business background. But, with sorting, size is a recipe for disaster: perversely, as a sort grows larger, “The unit cost of sorting, instead of falling, rises.” Sorting indulges into steep diseconomies of scale, violating our normal intuitions about the virtues of doing things in bulk. Cooking for two is typically no harder than cooking for one, and it’s certainly easier for one person twice. But, as a shelf of a hundred books will take you longer than sorting two bookshelves of fifty apiece: you have twice as many things to organize, and there are twice as many places each of them could go. The more you take on, the worse it gets.

This is the first and most fundamental insight of sorting theory. Scale hurts!

Computers, though, must routinely sort millions of items in a single go.

We are surrounded by people who only care about best-case performance. All records in sport reflect the single best performance. In Boxing, all it matters who won the match. Computer Science, however, almost never cares about the best case. Instead, computer scientists might want to know the average sort time. Moreover, a computer scientist would want to know the worst sort time. Worst-case analysis lets us make hard guarantees: that a critical process will finish in time, that deadlines won’t be blown.

CS has developed a shorthand specifically for measuring algorithmic worst-case scenarios: it’s called “Big-O” notation. Rather than expressing an algorithm by minutes and seconds. Big-O notation provides a way to talk about the kind of relationship that holds between the size of the problem and the programs running time.


Imagine you’re hosting a dinner party with n guests. The time required to clean the house for their arrival doesn’t depend on the number of guests at all. This is called Big-O of one problem O(1) It’s called constant time. Doesn’t matter if 100 showed up or 50. The time to clean the house before their arrival will remain the same. Totally invariant of the guest list.


So, what’s O(n)? Well, first n is the number of guests. These guests are all sitting on the dining table. Now, the time required to pass the dish around the table will be “Big-O of n”, written O(n), also known as “linear time” — with twice the guests, you’ll wait twice as long for the dish to come around. The time is dependent on the n value.

Simply put, as n increases so do the time. If you want to have the feast early just make sure not too many guests show up.


What if, as the guest arrived, each one hugged the others in greetings? Your first guest hugs you; your second guest has two hugs to give; your third guest, three. How many hugs will there be in total? This turns out to be “Big-O of n squared”, written O(n²) and also known as the “quadratic time.” One interesting algorithm that has become a punching bag for computer science students is Bubble sort. It’s simple, it’s intuitive, and it’s extremely inefficient. And it lands us in quadratic time.

It’s natural to wonder what can be a faster sorting possible.

The question sounds like it’s about productivity. But talk to a computer scientist and it turns out to be closer to metaphysics — akin to thinking about the speed of light, time travel, superconductors, or thermodynamic entropy. What are the universe’s fundamental rules and limits? What is possible? What is allowed? In this way, computer scientists are glimpsing God’s blueprint every bit as much as the particle physicists and cosmologists. What is the minimum effort required to make an order?

Could we find a constant-time sort O(1)? Well, even just confirming n books can’t be done in constant time since it requires checking all n of them.

O(log n)

In 1936, IBM began producing a line of machines called “collators” that could merge two separately ordered stacks of cards into one. As long as the two stacks were themselves sorted, the procedure of merging them into a single sorted stack was incredibly straightforward and took linear time. Simply compare the two top cards to each other, move the smaller of them to the new stack you’re creating, and repeat until finished.

Sorting two cards is simple: just put the smaller one on top. And given a pair of two-card stacks, both of them sorted, you can easily collate them into an ordered stack of four. Repeating this trick a few times, you’d build bigger and bigger stacks, each one of them already sorted. Soon enough, you could collate yourself a perfectly sorted full deck — with a final climactic merge, producing the desired result.

This approach is known as Merge sort. Each pass through the card doubles the size of the sorted stacks, so to completely sort n cards you’ll need to make as many passes as it takes to for the number 2, multiplied by itself, to equal n: the base-two logarithm, in other words. You can sort up to four cards in two collation passes, up to eight cards with a third pass, and up to sixteen cards with a fourth.

O(log n) is better for large n value

If you’re still strategizing about that bookshelf, Mergesort’s solution would be to order a pizza and invite over a few friends. Divide the books evenly, and have each person sort their stacks. Then pair people up and have them merge their stacks. Repeat the process until there are just two stacks left, and merge them one last time onto the shelf.

If you have a giant stack of census documents, and the documents are sorted by age of the person, and you want to find any 38 years old, you can look at the middle document. If the person is 38, you are done. If they are 39 or older, you discard all documents after this. If they are 37 or younger, you discard all documents before this.

You then take the remaining stack and do this whole thing again. This is more efficient, and it is called a log(n) solution, in this case, because the number of forms you need to look at relates to the logarithm (base 2 in this case) of the number of forms.

I hope this was helpful to learn about Big-O Notation.

Sorting something that you will never search is a complete waste, searching something you never sorted is merely inefficient.