Vitaly Parnas - Subset Sum problem NP-hardness proof

I never expected to dedicate a casual blog entry to a hardness proof, especially one fairly academic as this. However, the nature of it evokes certain elegance beyond what I typically encounter in the domain. Authentic theoreticians may not sympathize with this excitement. Yet I find enough aesthetic substance in the matter to indulge in the exercise that follows.

Introduction

Subset Sum: given a set A of integers, and an integer k, does some subset S of A sum to k?

Subset Sum is a NP Complete problem, possessing a pseudo-polynomial dynamic-programming solution. However, I’m less intrigued by the NP proof and more by the NP-hardness proof which I present below.

Subset Sum is NP hard

Proof: We’ll construct a polynomial reduction from 3-CNFSAT (3-literal per clause Conjunctive Normal Form Satisfiability) problem. Let Ψ be a 3-CNF formula with n variables and m clauses. Let’s build set A as follows:

For each variable x_i, 1 ≤ i ≤ n, append the following two integers to A:

y_i = 10^n+m-i + c_1ic_2i…c_mi, where digit c_ji is 1 if literal x_i appears in clause j, and 0 otherwise.
z_i = 10^n+m-i + d_1id_2i…d_mi, where digit d_ji is 1 if the negative literal ¬x_i appears in clause j, and 0 otherwise.

Then for each clause c_j, add two rows

g_j = 10^m-j
h_j = 10^m-j

Finally, let k = int(1ⁿ3^m) (exponents in string arithmetic). This concludes our reduction.

For example, the formula

(x₁ ∨ x2 ∨ x3) ∧ (¬x₂ ∨ x₃ ∨ x₄) ∧ (x₁ ∨ ¬x₂ ∨ ¬x₃)

yields the following set A and k:

A	x₁x₂x₃x₄,c₁c₂c₃
y₁	1 0 0 0, 1 0 1
z₁	1 0 0 0, 0 0 0
y₂	1 0 0, 1 0 0
z₂	1 0 0, 0 1 1
y₃	1 0, 1 1 0
z₃	1 0, 0 0 1
y₄	1, 0 1 0
z₄	1, 0 0 0
g₁	1 0 0
h₁	1 0 0
g₂	1 0
h₂	1 0
g₃	1
h₃	1

k	1 1 1 1 3 3 3

Note, k = 1111333 for the current example, given 4 variables and 3 clauses.

Reduction correctness

The reduction trivially constructs A in polynomial time.
Assignment t satisfies Ψ ==> exists subset S of A that sums to k:

Construct S as follows:
- Add y_i to S if t(x_i) is true, and z_i if t(x_i) is false.
- For each clause c_j, add g_j to S if less than 3 literals are satisfied in c_j, and additionally add h_j if less than 2 literals are satisfied.
The elements of S sum to k: since we add only y_i or z_i to S, every significant (variable) digit will contain a 1, and every insignificant (clause) digit will contain a 3, the sum of the number of literals satisfied in c_j plus the number of literals not satisfied (maximum two), compensated by g_j and h_j.
Assignment t doesn’t satisfy Ψ ==> no subset S of A exits that sums to k:

Since we cannot satisfy Ψ, exists some clause c_j that cannot be satisfied irrespective of the truth assignment. Given an arbitrary assignment t, clause c_j doesn’t contain a single literal that would satisfy it, and the y_i or z_i added to S would contain a 0 in the respective decimal point. Consequently, S can only sum to 2 in that decimal point from the compensating g_j and h_j.

End of proof

Conclusion

This proof draws inspiration from CLRS, but as I don’t presently have access to the full text, I can’t ascertain the original source. Does my somewhat terse coverage make sense? Do you find this sort of proof admirable, or entirely unremarkable?

Questions, comments? Connect.