## Subset Sum problem NP-hardness proof

I never expected to dedicate a casual blog entry to a hardness proof, especially one fairly academic as this. However, the nature of it evokes certain elegance beyond what I typically encounter in the domain. Authentic theoreticians may not sympathize with this excitement. Yet I find enough aesthetic substance in the matter to indulge in the exercise that follows.

### Introduction

Subset Sum: given a set A of integers, and an integer k, does some subset S of A sum to k?

Subset Sum is a NP Complete problem, possessing a pseudo-polynomial dynamic-programming solution. However, I’m less intrigued by the NP proof and more by the NP-hardness proof which I present below.

### Subset Sum is NP hard

Proof: We’ll construct a polynomial reduction from 3-CNFSAT (3-literal per clause Conjunctive Normal Form Satisfiability) problem. Let Ψ be a 3-CNF formula with n variables and m clauses. Let’s build set A as follows:

For each variable xi, 1 ≤ i ≤ n, append the following two integers to A:

• yi = 10n+m-i + c1ic2i…cmi, where digit cji is 1 if literal xi appears in clause j, and 0 otherwise.
• zi = 10n+m-i + d1id2i…dmi, where digit dji is 1 if the negative literal ¬xi appears in clause j, and 0 otherwise.

Then for each clause cj, add two rows

• gj = 10m-j
• hj = 10m-j

Finally, let k = int(1n3m) (exponents in string arithmetic). This concludes our reduction.

For example, the formula

(x1 ∨ x_2 ∨ x_3) ∧ (¬x2 ∨ x3 ∨ x4) ∧ (x1 ∨ ¬x2 ∨ ¬x3)

yields the following set A and k:

\begin{array}{r|rl} A & x_1x_2x_3x_4,c_1c_2c_3 \\ \hline y_1 & 1000,101 \\ z_1 & 1000,000 \\ y_2 & 100,100 \\ z_2 & 100,011 \\ y_3 & 10,110 \\ z_3 & 10,001 \\ y_4 & 1,010 \\ z_4 & 1,000 \\ g_1 & 100 \\ h_1 & 100 \\ g_2 & 10 \\ h_2 & 10 \\ g_3 & 1 \\ h_3 & 1 \\ \hline k & 1111333 \end{array}

Note, k = 1111333 for the current example, given 4 variables and 3 clauses.

Reduction correctness

• The reduction trivially constructs A in polynomial time.
• Assignment t satisfies Ψ ==> exists subset S of A that sums to k:

Construct S as follows:

• Add yi to S if t(xi) is true, and zi if t(xi) is false.
• For each clause cj, add gj to S if less than 3 literals are satisfied in cj, and additionally add hj if less than 2 literals are satisfied.

The elements of S sum to k: since we add only yi or zi to S, every significant (variable) digit will contain a 1, and every insignificant (clause) digit will contain a 3, the sum of the number of literals satisfied in cj plus the number of literals not satisfied (maximum two), compensated by gj and hj.

• Assignment t doesn’t satisfy Ψ ==> no subset S of A exits that sums to k:

Since we cannot satisfy Ψ, exists some clause cj that cannot be satisfied irrespective of the truth assignment. Given an arbitrary assignment t, clause cj doesn’t contain a single literal that would satisfy it, and the yi or zi added to S would contain a 0 in the respective decimal point. Consequently, S can only sum to 2 in that decimal point from the compensating gj and hj.

End of proof

### Conclusion

This proof draws inspiration from CLRS, but as I don’t presently have access to the full text, I can’t ascertain the original source. Does my somewhat terse coverage make sense? Do you find this sort of proof admirable, or entirely unremarkable?