HAY GUYS

Zergrinch · Local time: Jul 7, 2009, 05:01 PM

Well, I'm not entirely sure what to call the topic, but it's simpler to explain what I am trying to do.

I have an array of numbers, the list of which is:

508791
110776
60642
1268384
1831963
5141077
2200967
1079081
590535
2755461
176599
16323547
1739899
452583
6730411
5657172
9865374
561737
36144212
22983457
6164492
8489925
10450763
812882
22147053

They add up to 164,247,783

I am trying to group them in such a way that the first group is close to 94,204,280 and the second group is close to 70,043,499. These do not add up exactly, but are 4 figures off (164,247,789).

It's going to take me a lot of manual combinations to do this properly, even if it's at all possible.

So, is there a way to get Excel to do this automagically? I have already tried Solver and Goal Seek, but am unable to come up with a viable solution.

Jam it back in, in the dark.

**Fluffykitten McGrundlepuss** · Local time: Jul 7, 2009, 10:07 AM

Jesus. Variable group sizes and not having to group them in order means even if you did stick them all in some kind of matrix, it'd be fucking huge with all the permutations. I honestly think it'd be quicker to do with trial and error than to calculate it.

There's nowhere I can't reach.

ramoth · Local time: Jul 7, 2009, 01:19 AM

You do realize this is NP-Complete, right...

This thing is sticky, and I don't like it. I don't appreciate it.

value tart · Jul 7, 2009, 04:41 AM

*psst* Zerg

That means a computer will be really terrible at this

I am a dolphin, do you want me on your body?

Secret Squirrel · 07:05 AM

How powerful is the scripting inside excel?

You can't solve this problem closed form, but you could solve this by setting it up as an optimization problem, and this one looks perfect for a genetic algorithm.

You want to minimize the difference between 94,204,280 and the sum of each of those numbers multiplied by an integer that is either 1 or 0. Those integers becomes the binary string that represents the "gene", and the algorithm takes a swarm of initial guesses, and mutates and "mates" them, keeping the best (as in the ones that produce the closest sum to 94,204,280) and moves on to the next generation.

There's a learning curve here, but it seems possible depending how powerful the scripting capability is in Excel. Keep in mind that genetic algorithms can be computationally intensive (though in this case, evaluation of the objective is just adding a bunch of numbers, so I wouldn't expect a problem), and that an interpreted language isn't always ideal for implementation of a numerical algorithm.

There are some tools out there that you can play around with if you want to try this.

Note: Goal Seek in excel is probably some kind of quasi-Newton gradient descent method. This is something that is also used for minimization problems, but it won't work here because the objective is nonlinear and discrete (as opposed to a continuous function).

How ya doing, buddy?

Zergrinch · Local time: Jul 7, 2009, 08:22 PM

I don't know how powerful Excel scripting is. I know you can run Visual Basic scripts on it, but I don't know how to program. So admittedly when you talk about algorithm methods like "genetic" or "quasi-Newton gradual descent", I'm afraid I'm totally clueless.

Originally Posted by Mo0

*psst* Zerg

That means a computer will be really terrible at this

It's probably going to take the computer a ton of time to do, but hey. It's going to take me much more time to do. So, I don't really care about that

Originally Posted by Shin

Jesus. Variable group sizes and not having to group them in order means even if you did stick them all in some kind of matrix, it'd be fucking huge with all the permutations. I honestly think it'd be quicker to do with trial and error than to calculate it.

You were my best hope, Excel buddy ;___;

What kind of toxic man-thing is happening now?

**Fluffykitten McGrundlepuss** · Local time: Jul 7, 2009, 01:53 PM

15 minutes of trial and error tells me that:

110,776.00
1,079,081.00
2,755,461.00
8,489,925.00
9,865,374.00
10,450,763.00
16,323,547.00
22,147,053.00
22,983,457.00

Add up to 94,205,437, 1,157 more than one of your target numbers. The way I see it is this. You must have one or both of the largest numbers in the bigger group as without them, the rest don't add up to enough to form the bigger group. I then took groups of numbers that added up to just under the 94,204,280 target, and tried to make the difference using what was left. By changing one thing at a time you can zero in on the amount pretty quickly.

I'm not saying they can't be divided closer to your target but for 15 minutes work, it's pretty fucking close.

Most amazing jew boots

packrat · 04:20 PM

I don't know about in excel, but I figure something like this would be very feasible in C.

A solution that I came up with was to place all your numbers into a matrix, and have another matrix of the same length defined by a 25-bit binary(0000000000000000000000001), starting with one. You multiply the sample set with your binary matrix and then sum the values in the result. Then compare the result with one of your two target values (i.e. 94,204,280).

Next would be to establish a tolerance value to eliminate blatantly wrong solutions (say 500 or so). If your first result is within 500 of your target value, then have your program complement your binary matrix, multiply with the sample set matrix, and compare the sum of the resultant matrix with your second target (70,043,499). If it too is within 500, just have it display the binary value as well as each of the differences.

The program will then increment the number defining the binary matrix (0000000000000000000000010) and start all over again, printing out all combinations that are within 500 of your targets.
Afterward, you can scan through all of your results and select the solutions that have the smallest difference values and figure out the correct combination from the associated binary matrix. If you find you have way too many solutions popping up then you can just reduce your tolerance and run the program again.

Sadly, this whole cycle will have to run about 33 million times, but I'm sure that and the associated programming are faster than checking all solutions through trial and error.

This is similar to the method Secret Squirrel suggests, except without the genetic algorithm crap. You can be your own optimizing function.

I'm sure there are many ways to streamline this process but I'll freely admit that I'm a godawful programmer.

Additional Spam:
Okay, so I ended up doing this in Matlab in a little under an hour.
Your ideal solution is probably the following:

1. 508791
3. 60642
4. 1268384
5. 1831963
8. 1079081
10. 2755461
15. 6730411
17. 9865374
20. 22983457
24. 812882
25. 22147053

The above list summed = 70043499 (exactly as desired)

and

2. 110776
6. 5141077
7. 2200967
9. 590535
11. 176599
12. 16323547
13. 1739899
14. 452583
16. 5657172
18. 561737
19. 36144212
21. 6164492
22. 8489925
23. 10450763

the above list summed = 94204284 (off by 4)

In case you are curious, here is the program that I used to get the solution:

Spoiler:

What, you don't want my bikini-clad body?

Zergrinch · Local time: Jul 8, 2009, 08:25 AM

Wow, you nailed it exactly!

I think I get the gist of your logic, incrementing a binary matrix that is as long as my set, performing a sumtotal, and getting the difference. How do you come up with the limit variable? Seems to be 2 ^ [size of matrix] - 1?

Originally Posted by Shin

15 minutes of trial and error tells me that:

110,776.00
1,079,081.00
2,755,461.00
8,489,925.00
9,865,374.00
10,450,763.00
16,323,547.00
22,147,053.00
22,983,457.00

Add up to 94,205,437, 1,157 more than one of your target numbers. The way I see it is this. You must have one or both of the largest numbers in the bigger group as without them, the rest don't add up to enough to form the bigger group. I then took groups of numbers that added up to just under the 94,204,280 target, and tried to make the difference using what was left. By changing one thing at a time you can zero in on the amount pretty quickly.

I'm not saying they can't be divided closer to your target but for 15 minutes work, it's pretty fucking close.

Appreciate the time you put into this, mate. However, this was one of many matrices I was hoping to match, so I pretty much had to look for a computational alternative or lose my sanity

Jam it back in, in the dark.

Araes · Local time: Jul 7, 2009, 07:33 PM

The size of the limit variable would have a dependence on the size of the set, but its a statistical dependence on the variance in the set coupled with the number of solutions you want to retain, not just the total numbers (at least for reasonably small groups). ~~You could probably do something like take the average variance of the numbers or a multiple of the standard deviation as your limit.~~ Edit: Actually thinking more about the math, stdev ends up rather huge.

However, there's really no point if you just want the closest combo, as you can brute force a converging limit by taking the difference on each try, storing it as a min difference, and then discarding anything that doesn't meet that min. No guesswork on what to throw away, and it converges quite fast.

There's nowhere I can't reach.

Zergrinch · Local time: Jul 8, 2009, 08:46 AM

So... no? Or yes?

This thing is sticky, and I don't like it. I don't appreciate it.

Araes · Local time: Jul 7, 2009, 08:06 PM

If you want a long list of potential matches, then the limit depends on: the length of the list, the variance of the list, and how many matches you want. More than that, I can't say, as its complex.

If you want one match, then just do a brute force.

if( tolerance > LastTotal1Diff ){
tolerance = LastTotal1Diff;
}

which will end up with the closest match for your first sum if you assign Total1Diff to LastTotal1Diff at the end of each successful loop.

You could also do minima of the total difference for both sets if they weren't going to add up to such a close set of numbers.

You can also solve this in Excel using packrat's general strategy, if you don't like coding. X columns for the bit mask, X columns for your numbers, summary columns that give the totals, columns for the differences, and a column for the difference total. In a simple form with three numbers.

Code:

Target 1	Target 2									
100000		20000									
Bit 1	Bit 2	Bit 3	30405	2103	128923	Sum 1	Sum 2	Diff 1	Diff 2	Diff Total
0	0	0	0	0	0	0	161431	100000	141431	241431
0	0	1	0	0	128923	128923	32508	28923	12508	41431
0	1	0	0	2103	0	2103	159328	97897	139328	237225
0	1	1	0	2103	128923	131026	30405	31026	10405	41431
1	0	0	30405	0	0	30405	131026	69595	111026	180621
1	0	1	30405	0	128923	159328	2103	59328	17897	77225
1	1	0	30405	2103	0	32508	128923	67492	108923	176415
1	1	1	30405	2103	128923	161431	0	61431	20000	81431

I am a dolphin, do you want me on your body?

packrat · Jul 7, 2009, 08:20 PM

The script I posted is a brute force method which basically just avoids the comparative step Araes mentioned.

And yes, in my case, the limit variable is determined by the number of elements in your array. (you had 25, so 2^25 - 1 = 33554431)
Here are a few modifications to the script I posted that will make it easier to scale to different data sets:

Spoiler:

As stated earlier, this is a brute-force method that computes through every possible combination and then displays only those within a given tolerance value. Final discretion is left to the operator.
When I ran this program, it was probably only 1/4 of the way through when it output a combination that gave differences of 0 and 4. I just stopped it there and recorded the combination, since I doubt you could get much better than that.

I figure this script could actually be ported to C/C++, but I'm not the person to do that.

I was speaking idiomatically.

**Fluffykitten McGrundlepuss** · 06:14 AM

Originally Posted by Araes

If you want a long list of potential matches, then the limit depends on: the length of the list, the variance of the list, and how many matches you want. More than that, I can't say, as its complex.

If you want one match, then just do a brute force.

if( tolerance > LastTotal1Diff ){
tolerance = LastTotal1Diff;
}

which will end up with the closest match for your first sum if you assign Total1Diff to LastTotal1Diff at the end of each successful loop.

You could also do minima of the total difference for both sets if they weren't going to add up to such a close set of numbers.

You can also solve this in Excel using packrat's general strategy, if you don't like coding. X columns for the bit mask, X columns for your numbers, summary columns that give the totals, columns for the differences, and a column for the difference total. In a simple form with three numbers.

Code:

Target 1	Target 2									
100000		20000									
Bit 1	Bit 2	Bit 3	30405	2103	128923	Sum 1	Sum 2	Diff 1	Diff 2	Diff Total
0	0	0	0	0	0	0	161431	100000	141431	241431
0	0	1	0	0	128923	128923	32508	28923	12508	41431
0	1	0	0	2103	0	2103	159328	97897	139328	237225
0	1	1	0	2103	128923	131026	30405	31026	10405	41431
1	0	0	30405	0	0	30405	131026	69595	111026	180621
1	0	1	30405	0	128923	159328	2103	59328	17897	77225
1	1	0	30405	2103	0	32508	128923	67492	108923	176415
1	1	1	30405	2103	128923	161431	0	61431	20000	81431

If you did that in Excel for the set sizes Zerg has you'd end up with a monster of a spreadsheet though. 25 bit binary is over 33 and a half million rows just for the matrix!

I have to do this thing quite a lot but on a much smaller scale. People inevitably pay several cheques into the bank at once but don't always provide a paying in book so you have a list of cheques received and a list of bankings and need to marry the two up, so you're looking for numbers that match a target. The advantage is I guess that you know there's a solution and, being an accountant, pretty close is generally good enough although on the occasions where you need it exact, it can be a lengthy process but the more you do it, the more you get a feel for what adds up together.

I'd still be inclined to do it manually, but then I haven't a fucking clue about coding and a 67 million row spreadsheet just seems like a bit of overkill to me. Granted it wouldn't take too long to set up but the computer I use at work would pretty much tell me to fuck off if I tried to get it to calculate that I'd have thought.

Additional Spam:
As a matter of interest though, is there any decent quick way to create a binary matrix that big in Excel? I would have said that numbers across the top and down the side then divide the side number by the top number and if it's an integer, put in a "1", if it's not put in a "0" would do it but as far as I'm aware, Excel has no "Is an integer" command. I don't see how you'd go about creating a 33 million row matrix without spending half a lifetime tapping in ones and zeros, unless of course Excel has a handy create-a-binary-matrix function I don't know about (Which is entirely possible).

What kind of toxic man-thing is happening now?

Zergrinch · Local time: Jul 8, 2009, 07:15 PM

My first attempt was actually to set up an Excel worksheet in this manner:

Column A is the "Bits" column.
Column B has the matrix
Column C outputs Column B if the row in Column A is 1, and zero if it's not.
Column D outputs Column B if the row in Column A is 0, and zero if it's not.
I summed up column C and column D, and deduct the totals by the desired figures. I then added the absolute values of both, and placed it in E1

Then I entered Solver, and said:

Set E1 to Minimum by Changing Column A cells
Subject to the following constraints:
- Column A cells are integers
- Column A cells are greater than or equal to zero
- Column A cells are lesser than or equal to one

I then set 9999 iterations.

Seeing as how brute-forcing this took packrat 8 million iterations, I guess I shouldn't be so surprised that Excel refused to do this

FELIPE NO

**Fluffykitten McGrundlepuss** · Local time: Jul 8, 2009, 12:45 PM

Hahahaha.

I did initially try to work it out the same way Areas did but whilst his way is great for three or four numbers, the sheer time and effort needed to create a 25 bit binary matrix with each bit in a seperate cell is beyond prohibitive. If you had it all setup then you could just tap in the numbers from your data set and the two targets then sort the results by the size of the difference column. I know the newest version of Excel has a DEC2BIN command which converts decimals to binary but I don't know how to get that number split over several cells.

Ah, unless you use the search command. Have the numbers from 1 to 33,554,431 in column A, use DEC2BIN in column B to turn it into binary then in columns C to AA, have the numbers 1 to 25 across the top and tell Excel to report the figure in column B that is in the position listed at the top of the page. If you then copy the shit down to the bottom you've got a big old binary matrix and can then use an IF(c3=1,ab$2,0) job in the next 25 cells, copy that down and have every possible permutation listed out, then stick in your target numbers and some cells to calculate the differences, sort by the difference column and you're done!

I can't remember how to get the specific character out of a string off the top of my head though. SEARCH possibly?

What, you don't want my bikini-clad body?

Zergrinch · Local time: Jul 8, 2009, 08:40 PM

Araes' method is impossible to do in Excel 2007, period. For the simple reason that Excel only supports 1.4+ million rows

Most amazing jew boots

**Fluffykitten McGrundlepuss** · Local time: Jul 8, 2009, 01:53 PM

I thought you got unlimited rows these days?

Maybe I was told that by someone who's not had to use more than a million or so...

There's nowhere I can't reach.

Petrovich · Local time: Jul 8, 2009, 09:06 PM

This seems like a problem that could be solved using linear programming using an built-in script that ships with Excel called Solver. I would google "linear programming with excel" or "excel solver" more more information on this topic.

Below is how I would set up the excel spreadsheet. After you set the constraints, and cells to adjust, solver/excel should then be able to come up with two lists that fit your requirements.

The constraint D2 : D26 = F2:F26 is to assure that every number is used once.

Excel/Solver will place 1s and 0s in the cells B2:C26 until the constraints of G27=G29 and H27=H29 are satisified.
*Be sure to add a Binary constraint on cells B2:C26

If you want to give the program some leeway you could always specify some range for your constraints having something like G27 >= 90000000 AND G27 <= 95000000 (and so forth for H27) and solve that way.

Again though, I would search google for more information regarding this subject as I haven't really used solver for several years. I am quite confident this problem can be solved in this fashion though as we tackled far more complex problems in the class I took.

This thing is sticky, and I don't like it. I don't appreciate it.

**Fluffykitten McGrundlepuss** · Local time: Jul 14, 2009, 04:42 PM

The problem with using Solver like that is that you're only looking for a "right" answer, whereas Zerg doesn't know if there is a right answer, he just wants the closest answer. Yes, you can give it some leeway by changing the total boxes to less thans or whatever but you've still got to sit there increasing the maximum error a bit at a time to find the optimal solution, otherwise if you stick in too big of an allowable error you're going to end up with multiple solutions and not know for certain which is best.

It's possible that you could use Solver slightly differently to sort this out but I've not got my Office disk handy so I can't install the add on and play around with it.

I am a dolphin, do you want me on your body?

Zergrinch · Local time: Dec 8, 2009, 07:10 PM

I'm bumping this to say that I've found a nice little Excel add-in called "Find Combinations"

It purports to be especially useful for auditors dealing with clients who are especially... sloppy when it comes to recording invoice numbers. As it is with my problem, you have a total expense figure that you're trying to look for. You also have a set of numbers you're trying to combine in various ways to reach that total.

The difference is that it looks for exact summation, while my figures are not precise. While this is still unsuitable for the purposes of this thread, but may be of use to someone else.

While installing, you will need to add this site to your "Trusted Sites" in Internet Explorer:
http://s3downloads.lyquidity.com

P.S.: My exercise was basically an attempt to brute-force how Michael Porter classifies data from the UN ComTrade database to his nice little "Porter Cluster Model." He was quite reluctant to share it with us, hence the brute force. Haven't been able to get satisfactory results across years, so I eventually abandoned the effort.

I was speaking idiomatically.