## Subset Sum problem NP-hardness proof

2018-03-18 @Computer Science

I never expected to dedicate a casual blog entry to a hardness proof, especially one fairly academic as this. However, the nature of it evokes certain elegance beyond what I typically encounter in the domain. Authentic theoreticians may not sympathize with this excitement. Yet I find enough aesthetic substance in the matter to indulge in the exercise that follows.

### Introduction

Subset Sum: given a set A of integers, and an integer k, does some subset S of A sum to k?

Subset Sum is a NP Complete problem, possessing a pseudo-polynomial dynamic-programming solution. However, I’m less intrigued by the NP proof and more by the NP-hardness proof which I present below.

### Subset Sum is NP hard

Proof: We’ll construct a polynomial reduction from 3-CNFSAT (3-literal per clause Conjunctive Normal Form Satisfiability) problem. Let Ψ be a 3-CNF formula with n variables and m clauses. Let’s build set A as follows:

For each variable xi, 1 ≤ i ≤ n, append the following two integers to A:

• yi = 10n+m-i + c1ic2i…cmi, where digit cji is 1 if literal xi appears in clause j, and 0 otherwise.
• zi = 10n+m-i + d1id2i…dmi, where digit dji is 1 if the negative literal ¬xi appears in clause j, and 0 otherwise.

Then for each clause cj, add two rows

• gj = 10m-j
• hj = 10m-j

Finally, let k = int(1n3m) (exponents in string arithmetic). This concludes our reduction.

For example, the formula

(x1 ∨ x2 ∨ x3) ∧ (¬x2 ∨ x3 ∨ x4) ∧ (x1 ∨ ¬x2 ∨ ¬x3)

yields the following set A and k:

A x1x2x3x4,c1c2c3
y1 1 0 0 0, 1 0 1
z1 1 0 0 0, 0 0 0
y2 1 0 0, 1 0 0
z2 1 0 0, 0 1 1
y3 1 0, 1 1 0
z3 1 0, 0 0 1
y4 1, 0 1 0
z4 1, 0 0 0
g1 1 0 0
h1 1 0 0
g2 1 0
h2 1 0
g3 1
h3 1

k 1 1 1 1 3 3 3

Note, k = 1111333 for the current example, given 4 variables and 3 clauses.

Reduction correctness

• The reduction trivially constructs A in polynomial time.
• Assignment t satisfies Ψ ==> exists subset S of A that sums to k:

Construct S as follows:

• Add yi to S if t(xi) is true, and zi if t(xi) is false.
• For each clause cj, add gj to S if less than 3 literals are satisfied in cj, and additionally add hj if less than 2 literals are satisfied.

The elements of S sum to k: since we add only yi or zi to S, every significant (variable) digit will contain a 1, and every insignificant (clause) digit will contain a 3, the sum of the number of literals satisfied in cj plus the number of literals not satisfied (maximum two), compensated by gj and hj.

• Assignment t doesn’t satisfy Ψ ==> no subset S of A exits that sums to k:

Since we cannot satisfy Ψ, exists some clause cj that cannot be satisfied irrespective of the truth assignment. Given an arbitrary assignment t, clause cj doesn’t contain a single literal that would satisfy it, and the yi or zi added to S would contain a 0 in the respective decimal point. Consequently, S can only sum to 2 in that decimal point from the compensating gj and hj.

End of proof

### Conclusion

This proof draws inspiration from CLRS, but as I don’t presently have access to the full text, I can’t ascertain the original source. Does my somewhat terse coverage make sense? Do you find this sort of proof admirable, or entirely unremarkable?