Java Notes
Algorithms: Big-Oh Notation
How time and space grow as the amount of data increases
It's useful to estimate the cpu or memory resources an algorithm requires. This "complexity analysis" attempts to characterize the relationship between the number of data elements and resource usage (time or space) with a simple formula approximation. Many programmers have had ugly surprises when they moved from small test data to large data sets. This analysis will make you aware of potential problems.
Dominant Term
Big-Oh (the "O" stands for "order of") notation is concerned with what happens for very large values of N, therefore only the largest term in a polynomial is needed. All smaller terms are dropped.
For example, the number of operations in some sorts is N2 - N. For large values of N, the single N term is insignificant compared to N2, therefore one of these sorts would be described as an O(N2) algorithm.
Similarly, constant multipliers are ignored. So a O(4*N) algorithm is equivalent to O(N), which is how it should be written. Ultimately you want to pay attention to these multipliers in determining the performance, but for the first round of analysis using Big-Oh, you simply ignore constant factors.
Why Size Matters
Here is a table of typical cases, showing how many "operations" would be performed for various values of N. Logarithms to base 2 (as used here) are proportional to logarithms in other base, so this doesn't affect the big-oh formula.
constant | logarithmic | linear | quadratic | cubic | ||
---|---|---|---|---|---|---|
n | O(1) | O(log N) | O(N) | O(N log N) | O(N2) | O(N3) |
1 | 1 | 1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 2 | 2 | 4 | 8 |
4 | 1 | 2 | 4 | 8 | 16 | 64 |
8 | 1 | 3 | 8 | 24 | 64 | 512 |
16 | 1 | 4 | 16 | 64 | 256 | 4,096 |
1,024 | 1 | 10 | 1,024 | 10,240 | 1,048,576 | 1,073,741,824 |
1,048,576 | 1 | 20 | 1,048,576 | 20,971,520 | 1012 | 1016 |
Does anyone really have that much data?
It's quite common. For example, it's hard to find a digital camera that that has fewer than a million pixels (1 mega-pixel). These images are processed and displayed on the screen. The algorithms that do this had better not be O(N2)! If it took one microsecond (1 millionth of a second) to process each pixel, an O(N2) algorithm would take more than a week to finish processing a 1 megapixel image, and more than three months to process a 3 megapixel image (note the rate of increase is definitely not linear).
Another example is sound. CD audio samples are 16 bits, sampled 44,100 times per second for each of two channels. A typical 3 minute song consists of about 8 million data points. You had better choose the write algorithm to process this data.
A dictionary I've used for text analysis has about 125,000 entries. There's a big difference between a linear O(N), binary O(log N), or hash O(1) search.
Best, worst, and average cases
You should be clear about which cases big-oh notation describes. By default it usually refers to the average case, using random data. However, the characteristics for best, worst, and average cases can be very different, and the use of non-random data (often more realistic) data can have a big effect on some algorithms.
Why big-oh notation isn't always useful
Complexity analysis can be very useful, but there are problems with it too.
- Too hard to analyze. Many algorithms are simply too hard to analyze mathematically.
- Average case unknown. There may not be sufficient information to know what the most important "average" case really is, therefore analysis is impossible.
- Unknown constant. Both walking and traveling at the speed of light have a time-as-function-of-distance big-oh complexity of O(N). Altho they have the same big-oh characteristics, one is rather faster than the other. Big-oh analysis only tells you how it grows with the size of the problem, not how efficient it is.
- Small data sets. If there are no large amounts of data, algorithm efficiency may not be important.
Benchmarks are better
Big-oh notation can give very good ideas about performance for large amounts of data, but the only real way to know for sure is to actually try it with large data sets. There may be performance issues that are not taken into account by big-oh notation, eg, the effect on paging as virtual memory usage grows. Although benchmarks are better, they aren't feasible during the design process, so Big-Oh complexity analysis is the choice.
Typical big-oh values for common algorithms
Searching
Here is a table of typical cases.
Type of Search | Big-Oh | Comments |
---|---|---|
Linear search array/ArrayList/LinkedList | O(N) | |
Binary search sorted array/ArrayList | O(log N) | Requires sorted data. |
Search balanced tree | O(log N) | |
Search hash table | O(1) |
Other Typical Operations
Algorithm | array ArrayList | LinkedList |
---|---|---|
access front | O(1) | O(1) |
access back | O(1) | O(1) |
access middle | O(1) | O(N) |
insert at front | O(N) | O(1) |
insert at back | O(1) | O(1) |
insert in middle | O(N) | O(1) |
Sorting arrays/ArrayLists
Some sorting algorithms show variability in their Big-Oh performance. It is therefore interesting to look at their best, worst, and average performance. For this description "average" is applied to uniformly distributed values. The distribution of real values for any given application may be important in selecting a particular algorithm.
Type of Sort | Best | Worst | Average | Comments |
---|---|---|---|---|
BubbleSort | O(N) | O(N2) | O(N2) | Not a good sort, except with ideal data. |
Selection sort | O(N2) | O(N2) | O(N2) | Perhaps best of O(N2) sorts |
QuickSort | O(N log N) | O(N2) | O(N log N) | Good, but it worst case is O(N2) |
HeapSort | O(N log N) | O(N log N) | O(N log N) | Typically slower than QuickSort, but worst case is much better. |
Example - choosing a non-optimal algorithm
I had to sort a large array of numbers. The values were almost always already in order, and even when they weren't in order there was typically only one number that was out of order. Only rarely were the values completely disorganized. I used a bubble sort because it was O(1) for my "average" data. This was many years ago when CPUs were 1000 times slower. Today I would simply use the library sort for the amount of data I had because the difference in execution time would probably be unnoticed. However, there are always data sets which are so large that a choice of algorithms really matters.
Example - O(N3) surprise
I once wrote a text-processing program to solve some particular customer problem. After seeing how well it processed the test data, the customer produced real data, which I confidently ran the program on. The program froze -- the problem was that I had inadvertently used an O(N3) algorithm and there was no way it was going to finish in my lifetime. Fortunately, my reputation was restored when I was able to rewrite the offending algorithm within an hour and process the real data in under a minute. Still, it was a sobering experience, illustrating dangers in ignoring complexity analysis, using unrealistic test data, and giving customer demos.
Same Big-Oh, but big differences
Altho two algorithms have the same big-oh characteristics, they may differ by a factor of three (or more) in practical implementations. Remember that big-oh notation ignores constant overhead and constant factors. These can be substantial and can't be ignored in practical implementations.
Time-space tradeoffs
Sometimes it's possible to reduce execution time by using more space, or reduce space requirements by using a more time-intensive algorithm.