Subbag Sum

Terminology: A bag is a multiset. I insist on and promote bag because it is shorter, E. W. Dijkstra uses it, and submultiset is downright ugly.

Problem: Suppose a bag of n positive integers has sum equal to m, with n > m/2. Prove that each of 0...m is a subbag sum, i.e., for each i = 0, 1, 2, ..., m, there is a subbag with sum equal to i.

An equivalent statement is: Suppose m is partitioned into n parts, with n > m/2. Prove that each of 0...m can be formed by the parts. Some people also write 2n>m instead of n>m/2 in order to avoid leaving the domain of the integers; I think it is instructive to state this inequality in both forms, for each has its use. I personally find n>m/2 to help visualize the problem, but I fully understand that the important point is that n is ``large enough'', which is equally well conveyed by 2n>m.

Before we solve the problem, it is instructive to see that the bound n>m/2 is sharp. For m even, a bag with m/2 copies of 2 will have sum m but no subbag sum of 1. For m odd, a bag with (m-1)/2 - 1 = (m-3)/2 copies of 1 and one copy of m - (m-3)/2 = (m+3)/2 will have sum m but no subbag sum of (m-1)/2, e.g., m=9, n=4, bag = [1,1,1,6].

Solution: We will proceed by strong induction on m.

Base case: The statement is vacuously true when m=0.

Induction step: Suppose for any k with k<=m, for any bag of n>k/2 positive integers with sum k, each of 0...k is a subbag sum. We now prove that for any bag of n>(m+1)/2 positive integers with sum m+1, each of 0...k+1 is a subbag sum.

Let x be an element in the bag. If we omit a copy of it from the bag, then we have a bag of n-1 elements with sum m+1-x. In order to use the induction hypothesis, we verify:

    m+1-x <= m
===
    1 <= x
which is obvious, and
    2(n-1) > m+1-x
===
    2n > m+1-(x-2)
<==   { note 2n > m+1 }
    2 <= x
which is not guaranteed. However, we can assume this condition without loss of generality because if the bag is all 1's, the conclusion about subbag sums follows immediately. Thus we now assume that the bag has some non-1 elements, so that x>=2.

Then by induction hypothesis, the new bag provides all subbag sums of 0...m+1-x. Adding x back, we obtain x...m+1 in addition. Their union gives 0...m+1 iff x-1 <= m+1-x iff 2x-2 <= m. We obtain this condition by noting that the bag sum m+1 must be at least x+(n-1), as all elements other than our copy of x must be at least 1. We calculate:

    m+1 >= x+n-1
===
    m >= x - 2 + n
===
    2m >= 2x - 4 + 2n
==>   { note 2n > m+1, i.e., 2n >= m+2 }
    2m >= 2x - 4 + m + 2
===
    m >= 2x - 2
which finishes the proof.

Notes

Some readers may object to the use of a vacuous case as the base case. They would raise: doesn't this allow induction proofs of such absurdities as all horses have the same colour? I address this objection in two ways.

This problem is a generalization of an interesting homework problem: Suppose a bag of 16 positive integers has sum equal to 30, prove that each of 1...29 is a subbag sum.

David Radcliffe (radcliff@uwm.edu) attempted the following weaker generalization first: Suppose a bag of n+1 positive integers has sum equal to 2n, prove that each of 1...2n is a subbag sum. It turned out to be too difficult: the statement as an induction hypothesis is too weak. Interested readers may try their hands on it and see what goes wrong.

He then strengthened the antecedent: a bag of b>n positive integers with sum 2n. It could be proved by induction, in fact using essentially the same argument and calculation as in the above proof, but there is one unnecessary case analysis: 2n-x may be even or odd, but the induction hypothesis here would be only about smaller bags with even sums, thus requiring the odd case to be dealt with oddly.

He finally settled with the problem discussed in this article. (He stated the partition version.) In his original proof, he chose the largest x, not merely any x >= 2. This choice was driven by a concern to increase the chance of success: if we remove a large element or part, the new sum drops drastically, yet the new size is still relatively large (drops from n to n-1), and so the new partition is even finer. The intuition is precisely that if a partition is fine grained, it has a good chance of giving more subpart sums. But this is in conflict with another concern: if x is not small enough, the union of 0...m+1-x and x...m+1 may fail to cover 0...m+1. Interestingly, in his original proof of the second generalization (bags with even sums), he chose the smallest x with x>=2, probably with this concern in mind.

In retrospect, the proof is about obtaining a balance between these two conflicting concerns, and fortunately it is an easy balance: any x>=2 satisfies both conditions. Even if we were not lucky, the two conditions, expressed as inequalities on x, would still allow us to methodologically determine if the balance is possible.

The homework problem is purportedly posed under a chapter or section on the pigeon-hole principle. On this point, I refer the reader to E. W. Dijkstra's article EWD1094: The undeserved status of the pigeon-hole principle. Apparently, this problem is really about generalizing wisely and calculating inequalities simply. Without the mental burden of trying to apply the pigeon-hole principle, Radcliffe was able to find a suitable generalization and a comprehensible and even natural proof, and I was able to understand it enough to write it up and improve it a bit. (Really, the only hard part is in finding a suitable generalization to prove. Once you decide to prove it and use induction, you just can't resist the temptation to remove one x from the bag, and then all else is history.) If we chose to wrap our minds and this problem around the pigeon-hole principle, the resulting argument would be rather artificial and inaccessible, even though some people might call it ``ingenious'' for the same reason.

Albert Y. C. Lai