In mid May, Anup47 asked a question in the Chandoo.org forums about the use of a VBA macro to run a number of iterations of a variable against two sets of X values, you can see the post here. It turns out that the number of iterations was 500 columns of data with each column having 27 values.
On examination of the problem, it was going to be a straight forward matter of setting up a statistical function Linest and then using the Data Table command to run each set of data through the function.
The Linest will take the input data and return the statistics that Anup wanted.
The Data Table function will feed in the source data and tabulate the Input and Output data.
This Post follows through a worked example which you can follow along, download the Sample file to suit Sample File 97/2003 or Sample File 2007/10 version. The Sample File contains a worked example of the completed model as well as a Practice Page of the original data. Download the Excel 95/2003 or 2007/10 version above.
Please note that the sample file only contains 14 sets of data as opposed to the 500 Anup47 wanted to process.
Setup
There are a few things that needed setting up before the work starts.
- Headers
- Linest Area
- Link Area
- Data Table Area
Once these areas are setup we simply use the Excel Data Table function.
Once the Data Table function has run, the results can be processed or analysed as required.
Headers
The original data was just that, a tabulation of raw data. The two X sets of Data were in Columns 1 & 2. Each Column from D onwards has a set of Y data that was to be processed.
The first thing that was required was some Headers for the Input Data.
This isn’t strictly required but it is good practice and makes it easier to tabulate and analyse results later.
Insert a Row above the first line
Put X1, X2 in A1, B1 and Y1 in D1 and then drag the lower right Black Handle across top to the right and Excel will autofill the remaining cells.
Linest Area
To get the statistics which Anup wanted we will use the Excel Linest function.
Linest is a Statistical Function that takes a set of data and compares it, in this case to two sets of X Values and produces a set of statistical measure relevant to the correlation between the data sets.
This post isn’t going to explain the intricacies of Linest and I refer you to the Links section at the end where you can read more about the Linest function at your leisure.
For our purposes we need to know that Linest is an Array Formula and requires a 5 Row x 5 Column area to be entered into. For now we will just Array Enter the function =Linest($D$2:$D$28,A2:B28,True, True) into B32:F36.
To do that select the range B32:F36, Press F2 and type/paste the equation in, then Array Enter with Ctrl Shift Enter.
Link Area
To Link the Linest equation to a Data Table we need a link cell, which we will put just above the Linest area.
For now just enter a 1 in it.
We can now go back to the Linest area and link the Linest equation to our link area using the equation, =LINEST(OFFSET($C$2:$C$28,,$B$30),A2:B28,TRUE, TRUE)
To do that select the range B32:F36, Press F2 and type/paste the equation in, then Array Enter with Ctrl Shift Enter.
What this does is allow the Linest formula to access different columns Y1 to Y500 depending on the value of the Link cell B30 which is now 1.
Data Table Area
To setup a Data Table area we need a column of Inputs which will be the Run Numbers and the Row Inputs will be links to the Input and Output Cells.
In a range J33:J46 put the values 1 to 14. These will be the Run Numbers. ie Run No 1, Run No 2 etc (Green in the example below).
Across the top of the Data Table area we can put a number of links and associated labels (Yellow and Blue)
In this case there are 4 Output links =B31, =C31, =B34 and =B33 and their associated labels above them, as well as 2 Input equations and there Labels. The Input equations are simple Offset function that retrieves a value from Rows 1 or 2 based on the value of the Link Cell B30.
These are technically not required but make data analysis and identification of individual results later on a lot simpler.
Run Data Table
We can now run the data Table by selecting the Data Table area: J32:P46
Noting that we will be using a Column Input cell and that it will link to $B$30, the Link cell for the Linest command.
What this does is takes the first value from the Column J32:J46 and puts it into B30, then the Linest command will be calculated and the results put into the Data Table area along with the Inputs.
This is repeated for each cell in J32:J46 automatically.
The final Data Table is now populated as below:
You can see by extending the Data Table input column from 14 to 500 that the full 500 columns of Input Data could easily be processed.
Results
You now have a set-off data that can be analyzed using normal statistics, Min, Max, Std Deviation etc, or can be fed into a Pivot Table/Chart for analysis etc.
References
Linest References
http://chandoo.org/wp/2011/01/26/trendlines-and-forecasting-in-excel-part-2/
http://newtonexcelbach.wordpress.com/2011/01/19/using-linest-for-non-linear-curve-fitting/
Data Table References
http://chandoo.org/wp/2010/05/06/data-tables-monte-carlo-simulations-in-excel-a-comprehensive-guide/
How can the Data Table command help you become a data processing super hero?
How can the Data Table command help you become a data processing super hero?
Let us know in the comments below:























19 Responses to “How to Distribute Players Between Teams – Evenly”
An excellent solution, especially for large data sets.
Another solution without using solver would be to assign the player with the highest score to Team 1, the 2nd to team 2, 3rd to team 3, 4th to team 3, 5th to team 2, 6th to team 1, 7th to team 1 and it continues. This method would end up with a Std Dev of 0.001247219. This works best with a distribution with lower Std Dev for the dataset.
Full Disclosure: this is not my idea, remember reading something a few years ago. Think it may have been Ozgrid
thinking back I now remember why I read about it. About 10 years back I had to distribute around 300 team members into 25-30 odd teams. Used this method based on their performance scores. I used the method I described to do this and the distribution was pretty fair.
Solver would have saved me a ton of time though 🙂
I think the issue with you first Solver approach was that you took the absolute value of the sum of team deviations (which should always be zero except for rounding) instead of the sum of the absolute values (which is a reasonable measure of how unbalanced the teams are).
Here's another simple algorithm you could use: you start from the top (with players sorted from high to low), and at each step allocate the next player to whichever team has the smallest total so far. You can implement it dynamically with some formulas so it will update automatically when the data changes.
If the scores were more widely distributed (so that this might end up with not all teams the same size), you could add a constraint to only pick among the teams which currently have fewest players at each step, or just stop adding to any team when it hits its quota.
When I tried it on the sample, I got the three teams below, with a STDEV of 0.000942809 (i.e. about half of what Solver got to).
Team 1: John, Hugo, Tom, Josh, Eric, Zane, Charles, Andrew
Team 2: Barry, Michael, Kenny, Joe, Xavier, Patrick, Oliver, William
Team 3: Henry, Steven, Ben, Frank, Kyle, Edward, Cameron, Lachlan
Thanks for sharing!
Hi,
I was looking at all the solutions and this is closest to what I intended to do. I am dividing a bunch of players into 3 soccer teams. Players availability is also a factor while deciding the teams.
So the steps the excel needs to do is as follows:
1) In availability column if "yes" go to next
2) Equally divide 'Goalkeepers', 'Strikers', 'Defenders' basis their quality
So the end result gives each 3 teams a balance of players playing at different positions.
Can this be done on Google spreadsheet with only availability as an input from the user and rest calculates by itself.
Sorry for asking such a pointed question, but I have been struggling to find a solution for it for sometime now!
Hi Ishaan,
I am working on a similar problem at the moment, so I am wondering if you ever found a solution and if you are willing to share what you did.
Hi everyone, this is a variation of the famous Knapsack Problem https://en.wikipedia.org/wiki/Knapsack_problem.
I had to use a VBA implementation recently as part of a problem, where we ar trying to allocate teams of an organization into different locations (we are a large company with many different team). The goal was to optimally allocate teams to individual buildings without putting too many teams into one building and not splitting teams apart.
As we had around 400 teams of different sizes, solver couldn't handle it anymore. Luckily there is a Knapsack algorithm implementation in VBA readily available on the internet :).
I also went with a heuristic approach first!
An interesting mathematical solution but what if Eric and Xavier can't stand each other or Patrick is best friends with Steven - the real life problems that effect "even" teams.
@Joe
You can add more criteria like
If Eric and Xavier can't stand each other
=OR(AND(E15=1,E16=1),AND(F15=1,F16=1),AND(G15=1,G16=1))
It must be False
If Patrick is best friends with Steven
=OR(AND(E5=1,E17=1),AND(F5=1,F17=1),AND(G5=1,G17=1))
It must be True
Note that the 2 formulas above are exactly the same
except for the ranges
One must be True = Friends
One must be False = Not Friends
Nice Post!
Just one question What if number of players are not even or equally divisible.
Nice post Hui!
I download your workbook and just try to change in options the Precision Restriction from 10E-6 to 10-8 and the Convergence from 10E-4 to 10E-10. The process take almost the same time, but the results was great.
The standard deviation I got was 0,000471.
Team 1: John, Tom, Kenny, Frank, Eric, Xavier, Edward, Zane
Team 2: Steven, Hugo, Ben, Joe, Josh, Oliver, Cameron, William
Team 3: Barry, Henry, Michael, Kyle, Patrick, Charles, Andrew, Lachlan
Great application of Solver! Thanks for the link!
Great explanation. Well done... However, I tried with 6 teams of 4 players and solver never did finish.
How about vba code for the same data set.
I have 3 column A B C wherein A has text and B has number Wherein C is blank. And in C1 been the header C2 where I want the name to come evenly distributed the number which is in Column B.
My Lastcolumn is 1000.
Sorry if I'm being slow here, but how is 'Team Score' calculated? I've gone through the explanation several times but it seems to just appear.
@Hrmft
This process uses the Solver Excel addin
Solver is effectively taking the model and trying different solutions until it gets a solution that meets all the criteria
Then solver puts the solution into the cell and moves to the next cell
So yes it appears to "just appear"
Hi ! Thank you so much ! Works great 🙂
I cannot get the fourth Equation to work in my excel spreadsheet
You have =($E$2:$G$25=0)+($E$2:$G$25=1)=1 as a SUMIF solution, I have, =($F$2:$H$13=0)+($F$2:$H$13=1)=1 as my solution but it does not work. The only thing I changed is the ranges. Any suggestions?
Thank you.
Jim
I cannot get the fourth Equation of TURE or FALSE statements to work in my excel spreadsheet You have =($E$2:$G$25=0)+($E$2:$G$25=1)=1 as a SUMIF solution, I have, =($F$2:$H$13=0)+($F$2:$H$13=1)=1 as my solution but it does not work. The only thing I changed is the ranges. Any suggestions?
Sorry I left some of it out in the previous question,
Thank you. Jim