Vitaly Parnas - The ever common Knapsack problem

We encounter the Knapsack problem all the time. Given a set of items, each of a specific size and value, is it possible to fill a limited size knapsack with a subset of these items satisfying a minimum overall value?

Perhaps you’re a jewel thief or a bank robber and stumble upon a fairly diverse loot, drastically varying in size and value: gold, jewels, paintings, antique books, ancient Japanese steel, autographed LPs, the initial written manuscript of Ulysses, Rod Laver’s tennis racket, etc. Being an amateur, you arrive equipped with but a small duffel bag and wish to maximize the value of the loot you sabotage. Naturally, it would behoove you to favor the denser items, those that are small yet most valuable. You would probably favor the gold to the LPs.

Maybe you’re heading on a trip and want to maximize the overall value of the items you pack into your limited size suitcase. Most travellers follow this strategy, but exceptions have been made. I packed War and Piece the last time I crossed continents, although I can see how someone might prefer a more practical object, such as a butterfly net. Granted, the value appraisal can vary between individuals.

How about the kilogram buffet station of a restaurant? Being a budget consumer not indifferent to your health and physique, you may wish to maximize the nutritional value of your plate and yet respect a maximum price. Naturally, the leaner nutritional items first come to mind, ex: spinach, broccoli, rice, chicken breasts. There is a reason athletes emphasize these meals. Now, considering that you require variety and each ingredient diminishes in value beyond a certain proportion, this problem transforms naturally into the Knapsack variant. In other words, while you could opt for an entire plate of spinach, you’re not the legendary Popeye. Naturally, you can assume the selection to provide X servings of spinach, Y broccoli, Z rice, etc, to properly diversify the meal.

Our priorities, life choices, time management are no different. There is but a fixed knapsack we fill with activities. We wish to construct a life satisfying some self-defined notion of accumulated value, all while respecting the limited amount of resources at our disposal. The resources may comprise time, energy, finances. The value may constitute a personal growth potential, financial income, compound returns, etc.

With that, there exists an improperly framed cliché in time management that encourages focus on the biggest items. Fill your bowl with the biggest stones, since if you first opt for the small ones, you’ll compromise the space for the bigger items. This works in the context of this classic metaphor, but the term big can mislead when applied in practice. It can assume something most challenging and time consuming without considering the empirical value or the marginal return. Similar to an efficient Knapsack problem solution, you should emphasize the densest items of value. Maybe a consistently performed 20-minute activity yields astronomical compound returns, compared to something bulky and ‘approved’, yet entirely superficial. The first arguably demands priority.

Each item introduced into our life drains that resource pool. We are limited in the amount of hobbies pursued, books read, video links followed, soap operas mourned, blog posts written, red wine consumed, rounds of Dominoes played, events attended, superficial interactions carried out, kilometers hiked. The item cost, or size in the Knapsack problem formulation, limits itself not strictly by the time consumed, but by the energy dissipated and deviated from other tasks. A seemingly innocent minute of time can leak a disproportionate amount of energy.

Almost everything we do lends itself to a decision variant of the Knapsack problem. Regretfully, we often lack the initiative, discipline, or the precision to even approximate a proper transformation. Item size and value might not be immediately apparent, often subjective. Further yet, in presence of many items, our ability to calculate or approximate an ‘efficient’ solution reaches limitations. One must often resort to rubrics, decision matrices, Markov Decision Processes, or other organizational aids.

Knapsack computational complexity

The Knapsack problem is hard. Hard, in the computational complexity sense and hard in common wisdom. Having entered this plane, let me provide the formal problem definition.

Given the following instance:

a finite set U of items
a size function s(u) ∈ Z⁺ (positive integers)
a value function v(u) ∈ Z⁺
a size constraint B ∈ Z⁺
a value goal K ∈ Z⁺

Is there a subset U' of U such that s(U') ≤ B and v(U') ≥ K?

Note, the above represents the yes/no decision variant of the problem. A similar optimization variant would ask for the maximum value attainable within the size constraint. Computationally, the decision variant of a problem is no harder than the optimization variant, and often no easier, while the decision variant proofs offer certain convenience for mathematical languages [Garey, Johnson, 1979]. In any case, I’ll demonstrate that the decision variant of the 0-1 Knapsack problem is NP complete (NPC).

First, a few clarifications:

0-1 Knapsack? This implies whole items in the set. You cannot chop an item and take a part.

An alternative is a fractional knapsack problem. For this, an efficient solution does in fact exist. The solution consists of a greedy algorithm that fills the knapsack with items in decreasing order of item density = value/size. Once the chosen item no longer fits in the space remaining, we simply take a fraction of it that fits and the algorithm terminates. Voilà.
NP complete? Also known as NPC, this indicates both NP hard and within the class of NP problems.
1. What’s an NP problem? A problem belonging to the NP complexity class has no known efficient solution.
  
  An efficient solution is one of a polynomial/logarithmic computational complexity. Without too much background, as long as the problem complexity grows in a way that’s not exponential, the solution is considered efficient.
  
  An NP class problem lacks a known efficient solution. However, it can be efficiently verified. That is to say, given some solution, we can efficiently check if it satisfies the problem constraints.
2. What’s NP hard problem? Problem A is NP hard when we can demonstrate that either
  1. every NP problem reduces to A, or
  2. some NPC problem reduces to A. The vast majority of hardness proofs yield to this approach.
3. What do I mean by reducing one problem to another? I refer to polynomial reductions. Again, without entirely digressing to first principles, problem A polynomially-reduces to problem B when
  1. every instance of problem A reduces to some instance of problem B,
  2. the former instance solves A if and only if the latter instance solves B, and
  3. the reduction, being a computable function, runs in polynomial time, in other words, efficiently.
  In demonstrating such a reduction, we can conclude that problem B is at least as hard as problem A.
NPC problems are the hardest in the NP class of problems, since they are at least as hard as every other within that class. I’ll leave some references at the bottom for those interested in more background.

0-1 KNAPSACK is NP Complete

Proof:

Problem belongs to NP. Given a subset U' of U, we can trivially verify if it satisfies the respective size and value constraints. (This seems obvious, but problems do exist without an efficient verification algorithm.)
Problem is NP hard. We’ll reduce another NPC problem PARTITION to KNAPSACK.

Let me first define PARTITION:

Given a finite set A and a size function s(a) ∈ Z⁺, can we partition A into two equally sized subsets? That is to say, is there a subset A' of A such that s(A') = s(A - A')?

I will not provide PARTITION NPC proof, but it too involves a polynomial reduction from one of a series of NPC problem.

Let me now demonstrate that PARTITION polynomially-reduces to KNAPSACK and then prove the reduction correctness.

Reduction:

Take any instance x of PARTITION, and create an instance f(x) of KNAPSACK as follows, using our previous convention.
1. U = A.
2. s(u) = v(u) = s(a). We create an instance of identical item value and size.
3. B = K = (½)s(U). Set both the size and value constraints equal to one half the total size of U.
Remember, every PARTITION instance must simply reduce to some instance of KNAPSACK. It need not be a one-to-one reduction and could, in fact, reduce to the same instance, provided the reduction respects the other constraints.

Reduction correctness:
1. We see the reduction efficiency, executing in polynomial time.
2. x ∈ PARTITION implies f(x) ∈ KNAPSACK. If x meets the constraints of PARTITION, we know that we can decompose set A into two equally sized subsets (per the size function). Consequently, we know that s(A) is divisible by two. It follows, f(x) solves KNAPSACK by filling it with any of those two subsets that solve PARTITION. s(U') will precisely equal the maximum constraint B, and v(U') will also equal the minimum constraint K.
3. f(x) ∈ KNAPSACK implies x ∈ PARTITION. Since instance f(x) solves KNAPSACK, then there exists subset U' of U that precisely takes the size and value of one half of U, that being also the size/value constraint. It follows that x solves PARTITION by forming one set from the knapsack contents, and the other from the remainder, both equal in size.

End of proof

We’ve demonstrated that KNAPSACK is NP Complete, that is at least as computationally complex any other other NP class problem, but no harder than NP.

References

Garey, Michael R. and Johnson, David S. Computers and Intractability; A Guide to the Theory of NP-Completeness. 1979.
Computability: Short Guide
Computational Complexity: Short Guide
Computational complexity blog

Questions, comments? Connect.