![]() |
||
|
|
Welcome to the Exploding Garrmondo Weiner Interactive Swiss Army Penis. |
GFF is a community of gaming and music enthusiasts. We have a team of dedicated moderators, constant member-organized activities, and plenty of custom features, including our unique journal system. If this is your first visit, be sure to check out the FAQ or our GFWiki. You will have to register before you can post. Membership is completely free (and gets rid of the pesky advertisement unit underneath this message).
|
![]() |
|
Thread Tools |
Matrixes and Randomization in Excel
Well, I'm not entirely sure what to call the topic, but it's simpler to explain what I am trying to do.
I have an array of numbers, the list of which is:
I am trying to group them in such a way that the first group is close to 94,204,280 and the second group is close to 70,043,499. These do not add up exactly, but are 4 figures off (164,247,789). It's going to take me a lot of manual combinations to do this properly, even if it's at all possible. So, is there a way to get Excel to do this automagically? I have already tried Solver and Goal Seek, but am unable to come up with a viable solution. Jam it back in, in the dark. |
Jesus. Variable group sizes and not having to group them in order means even if you did stick them all in some kind of matrix, it'd be fucking huge with all the permutations. I honestly think it'd be quicker to do with trial and error than to calculate it.
There's nowhere I can't reach. ![]() |
You do realize this is NP-Complete, right...
This thing is sticky, and I don't like it. I don't appreciate it. ![]() |
*psst* Zerg
That means a computer will be really terrible at this I am a dolphin, do you want me on your body? |
How powerful is the scripting inside excel?
You can't solve this problem closed form, but you could solve this by setting it up as an optimization problem, and this one looks perfect for a genetic algorithm. You want to minimize the difference between 94,204,280 and the sum of each of those numbers multiplied by an integer that is either 1 or 0. Those integers becomes the binary string that represents the "gene", and the algorithm takes a swarm of initial guesses, and mutates and "mates" them, keeping the best (as in the ones that produce the closest sum to 94,204,280) and moves on to the next generation. There's a learning curve here, but it seems possible depending how powerful the scripting capability is in Excel. Keep in mind that genetic algorithms can be computationally intensive (though in this case, evaluation of the objective is just adding a bunch of numbers, so I wouldn't expect a problem), and that an interpreted language isn't always ideal for implementation of a numerical algorithm. There are some tools out there that you can play around with if you want to try this. Note: Goal Seek in excel is probably some kind of quasi-Newton gradient descent method. This is something that is also used for minimization problems, but it won't work here because the objective is nonlinear and discrete (as opposed to a continuous function). How ya doing, buddy?
Last edited by Secret Squirrel; Jul 7, 2009 at 07:05 AM.
|
I don't know how powerful Excel scripting is. I know you can run Visual Basic scripts on it, but I don't know how to program. So admittedly when you talk about algorithm methods like "genetic" or "quasi-Newton gradual descent", I'm afraid I'm totally clueless.
![]()
![]()
What kind of toxic man-thing is happening now? |
15 minutes of trial and error tells me that:
110,776.00 1,079,081.00 2,755,461.00 8,489,925.00 9,865,374.00 10,450,763.00 16,323,547.00 22,147,053.00 22,983,457.00 Add up to 94,205,437, 1,157 more than one of your target numbers. The way I see it is this. You must have one or both of the largest numbers in the bigger group as without them, the rest don't add up to enough to form the bigger group. I then took groups of numbers that added up to just under the 94,204,280 target, and tried to make the difference using what was left. By changing one thing at a time you can zero in on the amount pretty quickly. I'm not saying they can't be divided closer to your target but for 15 minutes work, it's pretty fucking close. Most amazing jew boots ![]() |
I don't know about in excel, but I figure something like this would be very feasible in C.
A solution that I came up with was to place all your numbers into a matrix, and have another matrix of the same length defined by a 25-bit binary(0000000000000000000000001), starting with one. You multiply the sample set with your binary matrix and then sum the values in the result. Then compare the result with one of your two target values (i.e. 94,204,280). Next would be to establish a tolerance value to eliminate blatantly wrong solutions (say 500 or so). If your first result is within 500 of your target value, then have your program complement your binary matrix, multiply with the sample set matrix, and compare the sum of the resultant matrix with your second target (70,043,499). If it too is within 500, just have it display the binary value as well as each of the differences. The program will then increment the number defining the binary matrix (0000000000000000000000010) and start all over again, printing out all combinations that are within 500 of your targets. Afterward, you can scan through all of your results and select the solutions that have the smallest difference values and figure out the correct combination from the associated binary matrix. If you find you have way too many solutions popping up then you can just reduce your tolerance and run the program again. Sadly, this whole cycle will have to run about 33 million times, but I'm sure that and the associated programming are faster than checking all solutions through trial and error. This is similar to the method Secret Squirrel suggests, except without the genetic algorithm crap. You can be your own optimizing function. I'm sure there are many ways to streamline this process but I'll freely admit that I'm a godawful programmer. Additional Spam: Okay, so I ended up doing this in Matlab in a little under an hour. Your ideal solution is probably the following: 1. 508791 3. 60642 4. 1268384 5. 1831963 8. 1079081 10. 2755461 15. 6730411 17. 9865374 20. 22983457 24. 812882 25. 22147053 The above list summed = 70043499 (exactly as desired) and 2. 110776 6. 5141077 7. 2200967 9. 590535 11. 176599 12. 16323547 13. 1739899 14. 452583 16. 5657172 18. 561737 19. 36144212 21. 6164492 22. 8489925 23. 10450763 the above list summed = 94204284 (off by 4) In case you are curious, here is the program that I used to get the solution: Spoiler:
What, you don't want my bikini-clad body? |
Wow, you nailed it exactly!
I think I get the gist of your logic, incrementing a binary matrix that is as long as my set, performing a sumtotal, and getting the difference. How do you come up with the limit variable? Seems to be 2 ^ [size of matrix] - 1?
![]() Jam it back in, in the dark. |
The size of the limit variable would have a dependence on the size of the set, but its a statistical dependence on the variance in the set coupled with the number of solutions you want to retain, not just the total numbers (at least for reasonably small groups).
However, there's really no point if you just want the closest combo, as you can brute force a converging limit by taking the difference on each try, storing it as a min difference, and then discarding anything that doesn't meet that min. No guesswork on what to throw away, and it converges quite fast. There's nowhere I can't reach. |
So... no? Or yes?
This thing is sticky, and I don't like it. I don't appreciate it. |
If you want a long list of potential matches, then the limit depends on: the length of the list, the variance of the list, and how many matches you want. More than that, I can't say, as its complex.
If you want one match, then just do a brute force. if( tolerance > LastTotal1Diff ){ tolerance = LastTotal1Diff; } which will end up with the closest match for your first sum if you assign Total1Diff to LastTotal1Diff at the end of each successful loop. You could also do minima of the total difference for both sets if they weren't going to add up to such a close set of numbers. You can also solve this in Excel using packrat's general strategy, if you don't like coding. X columns for the bit mask, X columns for your numbers, summary columns that give the totals, columns for the differences, and a column for the difference total. In a simple form with three numbers. Code:
Target 1 Target 2 100000 20000 Bit 1 Bit 2 Bit 3 30405 2103 128923 Sum 1 Sum 2 Diff 1 Diff 2 Diff Total 0 0 0 0 0 0 0 161431 100000 141431 241431 0 0 1 0 0 128923 128923 32508 28923 12508 41431 0 1 0 0 2103 0 2103 159328 97897 139328 237225 0 1 1 0 2103 128923 131026 30405 31026 10405 41431 1 0 0 30405 0 0 30405 131026 69595 111026 180621 1 0 1 30405 0 128923 159328 2103 59328 17897 77225 1 1 0 30405 2103 0 32508 128923 67492 108923 176415 1 1 1 30405 2103 128923 161431 0 61431 20000 81431 I am a dolphin, do you want me on your body? |
The script I posted is a brute force method which basically just avoids the comparative step Araes mentioned.
And yes, in my case, the limit variable is determined by the number of elements in your array. (you had 25, so 2^25 - 1 = 33554431) Here are a few modifications to the script I posted that will make it easier to scale to different data sets: Spoiler:
As stated earlier, this is a brute-force method that computes through every possible combination and then displays only those within a given tolerance value. Final discretion is left to the operator. When I ran this program, it was probably only 1/4 of the way through when it output a combination that gave differences of 0 and 4. I just stopped it there and recorded the combination, since I doubt you could get much better than that. I figure this script could actually be ported to C/C++, but I'm not the person to do that. I was speaking idiomatically. |
I have to do this thing quite a lot but on a much smaller scale. People inevitably pay several cheques into the bank at once but don't always provide a paying in book so you have a list of cheques received and a list of bankings and need to marry the two up, so you're looking for numbers that match a target. The advantage is I guess that you know there's a solution and, being an accountant, pretty close is generally good enough although on the occasions where you need it exact, it can be a lengthy process but the more you do it, the more you get a feel for what adds up together. I'd still be inclined to do it manually, but then I haven't a fucking clue about coding and a 67 million row spreadsheet just seems like a bit of overkill to me. Granted it wouldn't take too long to set up but the computer I use at work would pretty much tell me to fuck off if I tried to get it to calculate that I'd have thought. Additional Spam: As a matter of interest though, is there any decent quick way to create a binary matrix that big in Excel? I would have said that numbers across the top and down the side then divide the side number by the top number and if it's an integer, put in a "1", if it's not put in a "0" would do it but as far as I'm aware, Excel has no "Is an integer" command. I don't see how you'd go about creating a 33 million row matrix without spending half a lifetime tapping in ones and zeros, unless of course Excel has a handy create-a-binary-matrix function I don't know about (Which is entirely possible). What kind of toxic man-thing is happening now? ![]()
Last edited by Fluffykitten McGrundlepuss; Jul 8, 2009 at 06:14 AM.
Reason: This member got a little too post happy.
|
My first attempt was actually to set up an Excel worksheet in this manner:
Column A is the "Bits" column. Column B has the matrix Column C outputs Column B if the row in Column A is 1, and zero if it's not. Column D outputs Column B if the row in Column A is 0, and zero if it's not. I summed up column C and column D, and deduct the totals by the desired figures. I then added the absolute values of both, and placed it in E1 Then I entered Solver, and said: Set E1 to Minimum by Changing Column A cells Subject to the following constraints: - Column A cells are integers - Column A cells are greater than or equal to zero - Column A cells are lesser than or equal to one I then set 9999 iterations. Seeing as how brute-forcing this took packrat 8 million iterations, I guess I shouldn't be so surprised that Excel refused to do this ![]() FELIPE NO |
Hahahaha.
I did initially try to work it out the same way Areas did but whilst his way is great for three or four numbers, the sheer time and effort needed to create a 25 bit binary matrix with each bit in a seperate cell is beyond prohibitive. If you had it all setup then you could just tap in the numbers from your data set and the two targets then sort the results by the size of the difference column. I know the newest version of Excel has a DEC2BIN command which converts decimals to binary but I don't know how to get that number split over several cells. Ah, unless you use the search command. Have the numbers from 1 to 33,554,431 in column A, use DEC2BIN in column B to turn it into binary then in columns C to AA, have the numbers 1 to 25 across the top and tell Excel to report the figure in column B that is in the position listed at the top of the page. If you then copy the shit down to the bottom you've got a big old binary matrix and can then use an IF(c3=1,ab$2,0) job in the next 25 cells, copy that down and have every possible permutation listed out, then stick in your target numbers and some cells to calculate the differences, sort by the difference column and you're done! I can't remember how to get the specific character out of a string off the top of my head though. SEARCH possibly? What, you don't want my bikini-clad body? ![]() |
Araes' method is impossible to do in Excel 2007, period. For the simple reason that Excel only supports 1.4+ million rows
![]() Most amazing jew boots |
I thought you got unlimited rows these days?
Maybe I was told that by someone who's not had to use more than a million or so... There's nowhere I can't reach. ![]() |
This seems like a problem that could be solved using linear programming using an built-in script that ships with Excel called Solver. I would google "linear programming with excel" or "excel solver" more more information on this topic.
Below is how I would set up the excel spreadsheet. After you set the constraints, and cells to adjust, solver/excel should then be able to come up with two lists that fit your requirements. ![]() The constraint D2 : D26 = F2:F26 is to assure that every number is used once. Excel/Solver will place 1s and 0s in the cells B2:C26 until the constraints of G27=G29 and H27=H29 are satisified. *Be sure to add a Binary constraint on cells B2:C26 If you want to give the program some leeway you could always specify some range for your constraints having something like G27 >= 90000000 AND G27 <= 95000000 (and so forth for H27) and solve that way. Again though, I would search google for more information regarding this subject as I haven't really used solver for several years. I am quite confident this problem can be solved in this fashion though as we tackled far more complex problems in the class I took. This thing is sticky, and I don't like it. I don't appreciate it. |
The problem with using Solver like that is that you're only looking for a "right" answer, whereas Zerg doesn't know if there is a right answer, he just wants the closest answer. Yes, you can give it some leeway by changing the total boxes to less thans or whatever but you've still got to sit there increasing the maximum error a bit at a time to find the optimal solution, otherwise if you stick in too big of an allowable error you're going to end up with multiple solutions and not know for certain which is best.
It's possible that you could use Solver slightly differently to sort this out but I've not got my Office disk handy so I can't install the add on and play around with it. I am a dolphin, do you want me on your body? ![]() |
I'm bumping this to say that I've found a nice little Excel add-in called "Find Combinations"
It purports to be especially useful for auditors dealing with clients who are especially... sloppy when it comes to recording invoice numbers. As it is with my problem, you have a total expense figure that you're trying to look for. You also have a set of numbers you're trying to combine in various ways to reach that total. The difference is that it looks for exact summation, while my figures are not precise. While this is still unsuitable for the purposes of this thread, but may be of use to someone else. While installing, you will need to add this site to your "Trusted Sites" in Internet Explorer: http://s3downloads.lyquidity.com P.S.: My exercise was basically an attempt to brute-force how Michael Porter classifies data from the UN ComTrade database to his nice little "Porter Cluster Model." He was quite reluctant to share it with us, hence the brute force. Haven't been able to get satisfactory results across years, so I eventually abandoned the effort. I was speaking idiomatically. |