On Computational Complexity

2018-03-14 @Computer Science

I want to highlight the branch of Theoretical Computer Science concerned with classifying algorithmic complexity. Similar to the previous post on computability, this serves as a more philosophical complement to this more formal publication on the subject.

Environment specific vs asymptotic runtime

Any problem expressible algorithmically experiences a computational cost. In practical applications, the cost is dependent on multiple factors including the CPU clock speed, the number of CPU cores, the instruction set architecture, the number of actively scheduled jobs, the capacity of the application to facilitate parallelism among multiple/threads/cores/parallel nodes, the multi-layer memory architecture, whether the application is CPU/memory/disk/network bound, how effectively it balances resource usage, etc.

Most of the aforementioned factors are environment specific. As such, in theoretical analysis, algorithmic complexity is measured asymptotically, in terms of gradient trajectory as a function of input, irrespective of any architecture specific constant factor. For example, given an algorithm that receives n numbers as input and performs some computation, does the algorithm runtime grow linearly (O(n)), logarithmically (O(log n)), polynomially (O(nk) for some constant k), or exponentially (2O(n))? These constitute the basic gradient trajectories, although the runtime often experiences a more complex asymptotic growth such as O(n log n + nm + mk), for variables n, m and some constant k.

The key point to consider is the constant factor invariance. The execution of problem A with runtime O(n2) on CPU C1 may consume a k-factor increase in runtime over CPU C2, and yet asymptotically, kO(n2) = O(kn2) = O(n2). The important factor is the gradient with respect to input.

Constant runtime relaxation

On occasion, the constant factor is sufficiently high that even theoretical analysis indicates this aspect. At an extreme, one could argue that a particular algorithm executed on a practical machine will never exceed a certain input size. If we consider that as an axiom, a O(n2) runtime with n ≤ 100000 can arguably be considered a worst case constant runtime of O(1000002) = O(1).

In my experience, many software industry practitioners unacquainted with the theory of computability or algorithms, effectively disregard the algorithm runtime, both the absolute architecture-specific and the asymptotic measures, as long as the algorithm executes reasonably fast for the business expectations. This algorithmic complexity oblivious approach works for some without consequence, but I find it naive if you aspire to attain a level of comprehension on the implementation specific benefits and pitfalls, as well as be capable of objectively analyzing the different components and levels of abstraction that comprise your system. Personally, in cases when the system behaves unexpectedly or inefficiently, I am grateful in having invested the necessary analytical effort.

Acceptable complexity

Returning to asymptotic analysis, different types of problems yield varying ranges of acceptable computational complexity. Sort algorithms, in general, are expected to execute in O(n log n) time for input of length n, assuming a comparison sort paradigm. The empirical support for this claim is backed by a series of proven and tested O(n log n) algorithms that dominate the sort problem. The analytical support entails modeling the sort problem as a decision tree. The shallowest tree branch analysis reveals the depth of at least O(n log n) (more correctly represented as Ω(n log n)), indicating that an algorithm of inferior complexity is inefficient for the problem. (The factor may prove irrelevant in the scope of a more complex system that asymptotically dominates the runtime.)

Notwithstanding, an algorithm that severely violates certain asymptotic limits should alert the system maintainer. For an entertaining read on how inefficient a sort algorithm can be, see this publication. I suppose one could randomly permute the input array until encountering a sorted sequence, each of which would also require a linear cost to verify, but I would not consider this an efficient strategy in presence of n-factorial different permutations - a fatal exponential complexity.

Search problems over a sorted sequence of n items are expected to execute in O(log n) by following a traditional divide and conquer strategy in repeatedly reducing your search space by two. This factor especially proves crucial in complex data structures where a search repeatedly takes place.

Matrix multiplication is expected to execute in O(n3) for two n by n matrices, provided the classic multiplication algorithm. Superior algorithms exist such as the Strassen O(nlog2 7}) or better even, although the classic algorithm may practically outperform under certain constraints.

Randomized algorithms is a branch with a base in probability theory, with the purpose of attaining a runtime superior to the traditional algorithm in exchange for a certain small probability of error, usually reasonable for the problem.

P vs NP

In general, an algorithm is considered efficient if it yields a polynomial runtime, and inefficient for anything exponential or beyond. These are also classified as easy and hard problems, and computational complexity provides two classes for this purpose, P and NP. P covers the problems with a known polynomial solution and NP covers those with no known polynomial solution. No proof exists in regards to the equality of the two classes, although P ≠ NP is the suspected case and often presupposed.

NP problems are especially interesting. Due to their hard nature, much opportunity lies in either identifying a more efficient solution (there is no proof that one cannot exist), as well as an approximate solution to within a certain error factor. In fact, approximation algorithms is a branch that deals precisely with approximation schemes to hard problems. Certain hard problems even lend themselves to parameterized approximation algorithms, PTAS, enabling a trade-off between the error factor and runtime.

NP-hard problems and games

The Travelling Salesman Problem, defined by finding a shortest distance (Hamiltonian) cycle visiting all the vertices in a graph once, is a NP-hard problem particularly rich in research. Finding an exact solution proves computationally difficult/intractable beyond a 10-15 vertices, as the exponential runtime starts to overwhelm computation. However, a series of approximation strategies of varying complexity exist, as well as heuristics to enable more efficient (albeit still hard) search strategies.

For a while, I maintained a narrower perspective on the problems lending themselves to computational complexity analysis. I was surprised to discover research on the complexity of a series of games, and in particular classic Nintendo video games. I found the initial publication classifying the Super Mario Brothers series of video games NP-hard, one of the most entertaining reads among research publications, although questionable in impact, as the NP-hard classification only establishes the lower bound on complexity. A sequel publication, however, established an upper bound of PSPACE-complete for Super Mario Brothers. PSPACE, a superset of NP, is a space complexity class I’ll cover in a different article. (As typical in complexity class membership dichotomy, no proof exists to the nature of NP and PSPACE equality.)

The above series of publications reaffirmed the extent to which the theory of computational complexity can be innovative, abstract, and still entertaining.

Conclusion

I find this theoretical branch of Computer Science not only fascinating but fundamental in acquiring a greater appreciation for algorithmic behavior and the implications of complexity. I encourage anyone interested in Computer Science to explore the area. While the quality of traditional in-classroom pedagogical approaches on the subject can appear painfully dry, the impressive availability of present online sources with respective ratings and commentaries should leave little room for concern.

Questions, comments? Connect.