The domain independence of asymptotic analysis

2020-04-02 @Computer Science

Computational complexity analysis doesn’t just apply to Computer Science theory. It is one of those many often neglected tools that can guide in a variety of domain-independent contexts.

Some of the examples I provide here will seem exceptionally trivial. Effective solutions often are.

Yet I find important to recognize the underlying mechanisms that take place, and not strictly yield to intuition. This enables us to generalize the strategy to a broader range of neglected contexts.

Contents

Asymptotic complexity basic recap

For any problem of n units of something, a general solution may curtail to the following complexities with respect to n:

  1. Linear. The solution follows linearly with the growing n to some constant factor/scale.

  2. Polynomial. Ex: n2 if for every unit we must traverse the entire set (ie compare it with ever other unit).

  3. Sub-linear. Ex: logarithmic, square root, or even constant (when the solution follows in constant time irrespective of n.

  4. Exponential. The most expensive kind, often considered computationally intractable. Ex: 2n.

  5. Poly-logarithmic. A combination of polynomial and logarithmic: nlog(n), n2log(n), nlog2(n), etc.

Prerogative: achieve a more favorable complexity. Ex: Exponential => polynomial, polynomial => linear, linear => sub-linear => constant.

In case you desire a deeper exploration of asymptotic analysis (not terribly relevant here), see my essay on Computational Complexity.

Divide and Conquer your search space

When sorting a list, sometimes we apply the naive n2 approach in comparing each element with every other. Instead, we could attain a more favorable nlog(n) (poly-logarithmic) search complexity by continuously dividing our space in two, transforming a larger problem into smaller sub-problems of the same kind.

This approach effectively follows what’s known as the Divide & Conquer strategy. It applies not only to sort, but to a wide range of problems, some of which include n-digit multiplication, matrix multiplication, Fast Fourier Transform and even circuit layout.

In fact, as long as we keep dividing our search space by any factor (1/3, 1/9, 1/99), we still achieve the same asymptotically (poly-)logarithmic solution.

For many problems encountered in life, we already resort to asymptotically advantageous solutions without even realizing it. In scanning a sorted list, for instance (ie a file directory), our brain naturally divides the search space in a logarithmic manner.

Yet sometimes we exercise the naive approach when a more efficient method exists.

Lesson: Avoid sequential solutions; sort; divide your search space.

Indexes / Hash Tables

Any unsorted list/collection we tend to scan one-by-one in absence of other aids. To mitigate, sometimes it makes sense to first sort your data. Yet sometimes it makes sense to introduce an index, or even multiple.

My mention of such an index actually owes itself to a hash table, the formal equivalent in Computer Science theory.

But without submerging into hash table particulars, let’s consider a plain index that enables a smaller list of references to a larger data collection. This may sound trivial, but we often neglect such basic measures.

In a paper notebook, for instance, I make a habit of dedicating one of the first pages to index all crucial content. It actually tends to assume a reverse index, that which maps topics to pages, which may not even follow numerical order. The trivial strategy economizes a great deal of content search time.

This sort of index is merely a way to reduce your search complexity from n (linear), to m, the size of your index, and that being the worse case. In practice, if you sort or somehow better organize your index, the search will attain far better complexity than m-linear, such as logarithmic.

Another practical case of the above is a folder of links to important files. These files may spread out across many deeply-layered directories, repositories and projects.

Yet the simple measure of creating the ‘important documents’ folder of symbolic links (in a Unix context) to all those disparate files you prefer in your immediate field of attention, vastly reduces your search complexity, in addition to serving as an effective reminder.

Lesson: Index your data.

Increase dimensionality

We can achieve another speed boost by introducing dimensionality to our data.

Suppose a data set of n items would normally entail a worse case search time linear to n. By splitting the set into two dimensions of equal size (or unequal, without loss of asymptotic generality), we reduce our maximum linear search time from n to 2√n; with three dimensions to 3∛n, and with d dimensions to dn1/d.

Granted, we cannot just mechanically transform our one-dimensional set into a square or a cube. We can only add dimensionality to the extent that our data enables.

We intuitively know to split our yearly calendar into months and days. Rather than a search through all 365 days to locate a given date, we need but locate the month followed by the day, for a maximum of 12 + 30 searched entries.

And this assumes we search each dimension linearly. We would likely do this sub-linearly/logarithmically, since our months and days tend to be sorted.

The way we split our indexed written material into chapters/sections/subsections also caters to a type of dimensionality technique. Instead of a one-dimensional table of contents, the entirety of which the reader must scan to locate the subject of interest, we split the material into the respective sections, each cohesive in a way that drastically reduces the search time.

(Note, again, that with arbitrary dimensional splits (that don’t leverage any dimensional information), we obtain no real reduction of complexity. The three volumes of Montaigne’s essays, for example, are organized in no way that I can categorically synthesize, demanding that I linearly scan the entire table of contents for an essay of interest.)

Lesson: Maximally leverage dimensionality within your data.

Conclusion

Myriads of other theoretical concepts also yield to domain independence, including game theory, Machine Learning, AI, graph theory, approximation algorithms, parallel processing, probability, computer architecture, information theory, and even one of my favorites, computability theory.

For a handful of instances, see the following essays:

Lessons recaptured

  1. Avoid sequential solutions; sort; divide your search space.
  2. Index your data.
  3. Maximally leverage dimensionality within your data.

Questions, comments? Connect.