Formula Forensics No. 031 – Production Scheduling using Excel

Share

Facebook
Twitter
LinkedIn

Recently, Bluetaurean asked in the Chandoo.org Forums about ways to allocate work durations for various product lines across 24 hour days to create a daily schedule.

Both formula-based and VBA-based solutions were offered.

Today at formula Forensics we will take a look at the formula-based approach.

As always at Formula Forensics you can follow along, Download Here – Excel 2007-2013.

 

Set the Scene

Since one might encounter a similar need in a variety of contexts (manufacturing, engineering, project planning, etc.), we will look at a more general problem of allocating a set of tasks and corresponding durations to one or more days, as shown in the following diagram.

We will create two output views:

  • One that is a flat list that can then be manipulated further using Excel’s Pivot table feature, and
  • Another view that mimics a pivot-table (and is similar to a typical project Gantt view, but with actual values listed instead of a bar chart).

You can follow along using the attached Excel document. Download here Excel 2007+

 

Problem Specifics

  • We have a list of tasks and their durations.
  • We need to distribute the tasks to different days, without exceeding the maximum available duration in a given day.
  • When the hours in a day are “used up”, we need to allocate the remaining task duration to the next day, and so on.
  • On the other hand, if a given task does not use up all of the hours in a given day, we will need to assign more than one task for that day, provided the combined durations do not exceed the available hours for that day.
  • In other words, we will need to split a task across one or more days, or combine one or more tasks into a single day, as needed, to maximize the work performed in a given day.

 

Developing the Approach

Before we tackle this problem in Excel, let us review how we might do this manually. Like most things, we might use the following three step process:

  1. Take the first task and assign its duration to Day 1. If the task’s duration exceeds the maximum hours available in a day, allocate the portion of the duration that does not fit into Day 1 into Day 2.
  2. Take the second task, and see whether it can fit into an existing day, or whether it needs to be distributed to multiple days
  3. Etc. (OK… so that three-step process was a stretch!)

Statistics show that most people think in terms of IF-THEN-ELSE statements. So here it is…

For a given Day, and for a given Task,
If [Hours Not Allocated For that Task] > [Hours Available for that Day] Then
Set Duration for that Day as [Hours Available for that Day]
Else
Set Duration for that Day as [Hours Not Allocated for that Task]
End
Continue the above evaluation until all tasks have been allocated to days.
 

Of course, the above IF() logic can be condensed as follows:

MIN( [Hours Not Allocated For that Task][Hours Available for that Day] )

 

Putting it All Together: Output Option 1: Gantt-like View

Let us employ the above approach to create the Gantt-like view.

To make our approach more generic, we will use an Excel Name called “MaxHrsPerDay” to indicate the maximum available hours in a given day. (In the sample worksheet, it has been set to 24 hours.)

Our source data is setup as shown in the diagram below:

  • Tasks are in the range A2:A5
  • Durations are in the range B2:B5

We will create the output in a separate worksheet, in the range A1:E5 as shown below:

Put the following formula into cell A2 and copy down to A5:

=SourceData!$A2

(This formula is merely referencing the values from the SourceData sheet. The sample workbook also includes an approach to make this reference more location independent.)

Put the following formula in cell B2, and copy it down and right:

=MIN((SourceData!$B2-SUM($A2:A2)), (MaxHrsPerDay-SUM(B$1:B1)))

 

Setup the header row (B1:E1) as desired. (I have used text values for the header. You could also calculate the header text using formulas. Since that is straightforward, I will leave that as an exercise for the reader.)

Now let us look at what the formula in cell B2 is doing:

  • SUM($A2:A2) is calculating the sum of the allocated durations for TaskA. (Please note the use of absolute and relative references. The formula is anchored on column A, but the starting row, ending row and ending column are free to expand.) SUM($A2:A2) returns zero since SUM() ignores text values.

– If you look at cell C2, the reference changes to SUM($A2:B2).
– In cell B3, the reference changes to SUM($A3:A3). You get the idea

  • (SourceData!$B2-SUM($A2:A2)) calculates the difference between the duration for TaskA (40 in the example) and the hours allocated as of that point (0), to return 40-0=40.
  • SUM(B$1:B1) is calculating the sum of the allocated hours for Day1. (Again, we are using a combination of absolute and relative references to keep the calculation anchored on column B.) In this case, the value is zero, since this is the first allocation for Day1.
  • (MaxHrsPerDay-SUM(B$1:B1)) calculates the hours remaining (i.e. available) for Day1. Since this is for cell B2, the calculation returns 24 – 0 = 24.

That is it!

We put those absolute and relative references to good use!

This approach was easy because all we had to do was calculate the duration for a given task for a given day.

 

On the other hand, if we had to figure out what the Task was, or which Day it was, the calculation gets a little more involved. Since this is “formula forensics”, we would not have it any other way! 🙂

 

Putting it All Together: Output Option 2: A Sequential List of Tasks and Durations for Each Day (i.e. a Flat List)

As before, we will use the Excel Name “MaxHrsPerDay” to refer to the maximum hours in a Day.

As shown in the following diagram, we will turn the source data into a flat list of Days, Tasks and Durations:

Unlike with VBA, since a formula cannot choose which row and column to write its output, we have to set the formula in every cell where we suspect there might be a value.

In the above sample diagram, we copy the formulas from row 2 to row 9. However, row 9 shows “…” indicating that the list was completed by row 8.

Let us look at how to determine the value for Day, Task and Allocated Duration.

For ease of description, I have created the following Excel Names:

WorkList: =A2:A5 in the source data.

WorkDuration: =B2:B5 in the source data

While creating the Gantt-like view earlier, we were able to take advantage of the static “Day” and “Task” values to determine the Remaining Duration, Available Duration, etc. Since we now have to determine all three values (Day, Task, Allocated Duration), we will need some “helper” data.

We will add a column alongside the source data that shows the cumulative duration (for reasons that will become clear shortly), as shown in the following diagram:

Cumulative Duration is calculated as the sum of all durations up to a given row.

  • For example, in cell C2, the Cumulative Duration is 40.
  • In cell C3, the Cumulative Duration is 40+20=60
  • And so on.

For ease of referencing, we will use an Excel Name called CumulativeDuration =C2:C5.

 

Let us look at why we need the “CumulativeDuration” helper column:

The circular logic problem

In order to determine the durations already allocated for a given day, we will need to know which Day it is.

We also need to know which Task we are trying to calculate the duration for.

So… do we calculate the Day or the Task or the Duration first?!! As you can imagine, that will soon land us in some circular logic.

 

Some helpful observations about the output:

  • In column C of the output (on worksheet FlatList), the sum of allocated durations adds up to the total duration for all tasks. (No surprise here!)
  • If every task had duration equal to the MaxHrsPerDay, you would have the same duration value for all days. (Not surprising, but interesting!)
  • In other words, you could think of the Allocated Duration column as the total duration for all tasks, allocated MaxHrsPerDay at a time.
  • Now we need a way to iterate through the duration values one at a time and account for the durations already processed. In other words, each value needs to contain all of the previous values. Welcome to an array of the cumulative durations!
  • For example, in the cumulative array “{40;60;65;80}”, the value 60 already includes the previous value 40 in it. This allows us to subtract all durations allocated up to a given row, to get the duration value that is remaining to be allocated.
  • Since Excel is good with numbers, we will base the calculation for AllocatedDuration and Tasks on the Duration values.
  • By calculating the two values separately, we avoid the circular logic.

Let’s now look at the formulas for Day, WorkItem and AllocatedDuration.

It would be easier if we looked at the formulas in reverse order, starting with AllocatedDuration, then WorkItem, and finally Day.

Formula for “AllocatedDuration”

Enter the following formula into cell C2, ending with Ctrl+Shift+Enter, as shown in the following diagram:

=IF(SUM(C$1:C1)>=SUMPRODUCT(WorkDuration), “…”,MIN(INDEX(WorkDuration, MATCH(TRUE, CumulativeDuration-SUM(C$1:C1) > 0, 0)) – SUMIFS(C$1:C1, B$1:B1,B2), MaxHrsPerDay-SUMPRODUCT((A$1:A1=A2)* IF(ISNUMBER(C$1:C1), C$1:C1, 0)))) Ctrl+Shift+Enter

Let us look at the formula closely (using the formula in row 2):

  • SUMPRODUCT((A$1:A1=A2)* IF(ISNUMBER(C$1:C1), C$1:C1, 0)) -> This calculates the sum of all allocated durations up to the previous row, where the Day = current row’s day. Please note the use of absolute and relative references. They allow us to expand the range as we go down the rows, while remaining anchored to the first row.

– Since this is the first data row, C$1:C1 returns “Allocated Duration” and the ISNUMBER() function returns FALSE, and consequently, the IF() function returns 0.
– A$1:A1 returns “Day”, and the test A$1:A1=A2 returns FALSE. Please note that in this case, it does not matter whether A2 has a value in it, whether it has the value 1, etc.
– SUMPRODUCT() provides the result of FALSE * 0 = 0

  • MaxHrsPerDaySUMPRODUCT((A$1:A1=A2)* IF(ISNUMBER(C$1:C1), C$1:C1, 0)) -> This calculates the difference between maximum duration available for a day and the sum of durations allocated for the current day. In other words, it calculates the available duration for the current row’s day.

– In this example, the calculation results in MaxHrsPerDay (24 in our example) – 0 = 24

  • SUMIFS(C$1:C1, B$1:B1,B2) -> This calculates the sum of all allocated durations for the current row’s task. Since B$1:B1 is the text value “Work Item”, the SUMIFS() returns 0. Again, it does not matter if B2 is blank or has a value like “TaskA”, since Excel correctly evaluates the condition whether B$1:B1 equals B2.
  • SUM(C$1:C1) -> This calculates the sum of all allocated durations up to the previous row.
  • CumulativeDurationSUM(C$1:C1) -> CumulativeDuration evaluates to {40;60;65;80}. SUM(C$1:C1) evaluates to zero. As such, the expression evaluates to {40;60;65;80} – 0, or {40;60;65;80}.

– If we look at the calculation for this expression in cell C3 (the expression would be “CumulativeDuration—SUM(C$1:C2)”), we would get the result of {40;60;65;80} – (0+24) = {16;36;41;56}. (As you know, subtracting a scalar value from an array results in an array with each value reduced by the scalar value.)

– If we look at the calculation for this expression in cell C4 (the expression would be “CumulativeDuration—SUM(C$1:C3)”) , we would get the result of {40;60;65;80} – (0+24+16) = {0;20;25;40}

– As you can see, each successive calculation reduces the CumulativeDuration array by the amount of hours already allocated. By reducing the CumulativeDuration array in this fashion, we ensure that we do not “double count” a duration.

– If a value in the array evaluates to zero, it means the corresponding duration has been fully allocated. (In cell C3, the first value in the array is zero, indicating that the original 40 hours has been fully allocated.) We will put this knowledge to good use in the next expression.

  • MATCH(TRUE, CumulativeDuration—SUM(C$1:C1) > 0, 0) -> The expression CumulativeDuration—SUM(C$1:C1) > 0 evaluates to ={TRUE;TRUE;TRUE;TRUE} because all values are greater than zero. By performing a MATCH() for TRUE, we are able to find the first location in the array that has a non-zero value.

– If we look at the result of this expression in cell C3, we get {16;36;41;56} > 0 = {TRUE;TRUE;TRUE;TRUE}

– If we look at the result of this expression in cell C4, we get {0;20;25;40} > 0 = {FALSE;TRUE;TRUE;TRUE}

– As you recall, the zero values (or FALSE) correspond to the durations that have been fully allocated, whereas, the non-zero values (or TRUE) correspond to the durations that have NOT been fully allocated.

– It is helpful to note that MATCH() returns the LOCATION of what it finds. As such, the returned location is that of the first duration value that has not been fully allocated! Since the CumulativeDuration array is the same size as the WorkDuration array, we will be able to put this returned location value to good use in the next expression.

  • INDEX(WorkDuration, MATCH(TRUE, CumulativeDuration — SUM(C$1:C1) > 0, 0)) -> By using the location value (of the first duration value that has not been fully allocated), we find the corresponding original duration value from the WorkDuration array.

– As we saw earlier, the expression “CumulativeDiration – SUM(C$1:C1)” reduces the CumulativeDuration by the duration values allocated to that point. However, the resulting array could have partial duration values as well. By referencing the corresponding duration value from the WorkDuration array, we ensure that we retrieve the original (full) duration value that was to be allocated.

  • MIN(…) -> This expression calculates the value of MIN([Hours Not Allocated For that Task], [Hours Available for that Day])

– [Hours Not Allocated For that Task] is returned by INDEX(WorkDuration, MATCH(TRUE, CumulativeDuration—SUM(C$1:C1) > 0, 0)) – SUMIFS(C$1:C1, B$1:B1,B2)

– [Hours Available for that Day] is returned by second half of the MIN() expression: MaxHrsPerDay—SUMPRODUCT((A$1:A1=A2)* IF(ISNUMBER(C$1:C1), C$1:C1, 0)).

– So, we essentially got back to the logic we started from, which is the same logic we used for creating the Gantt-like view as well.

  • The remaining portion of the formula (the IF() check) determines if all of the hours have been allocated. If all hours have been allocated, it returns “…”.

SUMPRODUCT(WorkDuration) -> This expression calculates the total of all work duration values. In cell C2, it evaluates to SUMPRODUCT({40;20;5;15}) = 80

SUM(C$1:C1)>=SUMPRODUCT(WorkDuration) -> Determines if the sum of durations allocated up to that point is greater than the total for all durations. (Since this is part of an array formula, you could also use the SUM function in place of SUMPRODUCT. But I am partial to the SUMPRODUCT function!! So, unless you are in a competition where the winner is determined by the shortest formula, feel free to use either one!

 

Formula for “WorkItem”

Enter the following formula into cell B2, ending with Ctrl+Shift+Enter, as shown in the following diagram.

=IF(SUM(C$1:C1)>=SUMPRODUCT(WorkDuration), “…”,INDEX(WorkList, MATCH(TRUE, (CumulativeDuration-SUM(C$1:C1)) > 0, 0))) Ctrl+Shift+Enter

You are already familiar with most of the formula components since you saw them in the formula for AllocatedDuration. The only difference is that in this formula, we are returning a value from WorkList. (i.e. we locate the position of the first non-zero duration in CumulativeDuration array, and since that array is the same size as the WorkList array, we are able to find the first Task that has not been fully allocated.)

Formula for “Day”

Enter the following formula into cell A2, ending with Ctrl+Shift+Enter, as shown in the following diagram:

=IF(SUM(C$1:C1)>=SUMPRODUCT(WorkDuration), “…”, MAX( N(A1) + (SUMIFS(C$1:C1, A$1:A1, A1)>=MaxHrsPerDay), 1)) Ctrl+Shift+Enter

Let us look at the formula in detail (using the formula in row 2):

  • SUMIFS(C$1:C1, A$1:A1, A1) -> This expression calculates the sum of all durations (in column C) where the Days (in column A) equal the previous day.

– In cell A2, this expression evaluates to “SUMIFS(“Allocated Duration”, “Day”, “Day”)” = 0. (Excel smartly ignores any non-numeric values in the first argument.)

– In cell A3, this expression evaluates to “SUMIFS({“Allocated Duration”;24}, {“Day”;1}, 1)” = 24.

  • SUMIFS(C$1:C1, A$1:A1, A1)>=MaxHrsPerDay -> This expression checks if the sum of all durations where the Days equal the previous day is greater than or equal to MaxHrsPerDay.

– In cell A2, this expression evaluates to FALSE

– In cell A3, this expression evaluates to TRUE

  • N(A1) -> This expression returns the numeric value for its argument. Since N() returns zero for any non-numeric arguments, we use this function to return zero for the heading (“Day”) in A1. (Any numeric values are returned as is.)
  • MAX( N(A1) + (SUMIFS(C$1:C1, A$1:A1, A1)>=MaxHrsPerDay), 1) -> The first argument of the MAX function “N(A1) + (SUMIFS(C$1:C1, A$1:A1, A1)>=MaxHrsPerDay)”returns the next increment for day, if the previous day has been fully allocated. Otherwise, it returns the same value as the previous day.

– In cell A2, this expression evaluates to MAX( N(“Day”) + (SUMIFS(“Allocated Duration”, “Day”, “Day”)>=24), 1), which evaluates to MAX( N(“Day”) + (0>=24), 1), which evaluates to MAX( 0 + (FALSE), 1), which finally evaluates to 1.

– In cell A3, this expression evaluates to MAX( N(1) + (SUMIFS({“Allocated Duration”;24}, {“Day”;1}, 1)>=24), which evaluates to MAX( N(1) + (24>=24), 1), which evaluates to MAX( 1+ (TRUE), 1), which finally evaluates to 2 since 1 + TRUE = 2.

 

Download

You can download a copy of the above file and follow along, Download Here – Excel 2007-2013.

 

Final Thoughts

While we used the same basic logic for both output options in this article, there are probably many other ways to tackle the age-old problem of production scheduling.

I would love to hear about some of your ideas, as well as ways to extend the concepts described here.

In the meantime, I wish you continued EXCELlence!

Sajan.

 

Other Chandoo.org Posts related to Scheduling

Here at Chandoo.org you can find the following related posts:

http://www.chandoo.org/wp/2010/11/18/scheduling-variable-sources/

http://chandoo.org/wp/2009/06/16/gantt-charts-project-management/

http://chandoo.org/wp/project-management-templates/gantt-charts/

 

Thank You

This was Sajan’s second post at Chandoo.org and so a special thank you to Sajan for putting pen to paper to describe the technique here.

You may want to read Sajan’s first post here or thank him in the comments below:

Formula Forensics “The Series”

This is the 31st post in the Formula Forensics series.

You can learn more about how to pull Excel Formulas apart in the following posts: Formula Forensic Series

 

Formula Forensics Needs Your Help

I need more ideas for future Formula Forensics posts and so I need your help.

If you have a neat formula that you would like to share like above, try putting pen to paper and draft up a Post like Sajan has done above or;

If you have a formula that you would like explained, but don’t want to write a post, send it to Hui or Chandoo.

Facebook
Twitter
LinkedIn

Share this tip with your colleagues

Excel and Power BI tips - Chandoo.org Newsletter

Get FREE Excel + Power BI Tips

Simple, fun and useful emails, once per week.

Learn & be awesome.

Welcome to Chandoo.org

Thank you so much for visiting. My aim is to make you awesome in Excel & Power BI. I do this by sharing videos, tips, examples and downloads on this website. There are more than 1,000 pages with all things Excel, Power BI, Dashboards & VBA here. Go ahead and spend few minutes to be AWESOME.

Read my storyFREE Excel tips book

Overall I learned a lot and I thought you did a great job of explaining how to do things. This will definitely elevate my reporting in the future.
Rebekah S
Reporting Analyst
Excel formula list - 100+ examples and howto guide for you

From simple to complex, there is a formula for every occasion. Check out the list now.

Calendars, invoices, trackers and much more. All free, fun and fantastic.

Advanced Pivot Table tricks

Power Query, Data model, DAX, Filters, Slicers, Conditional formats and beautiful charts. It's all here.

Still on fence about Power BI? In this getting started guide, learn what is Power BI, how to get it and how to create your first report from scratch.

20 Responses to “Simulating Dice throws – the correct way to do it in excel”

  1. alpha bravo says:

    You have an interesting point, but the bell curve theory is nonsense. Certainly it is not what you would want, even if it were true.

  2. Karl says:

    Alpha Bravo - Although not a distribution curve in the strict sense, is does reflect the actual results of throwing two physical dice.

    And reflects the following . .
    There is 1 way of throwing a total of 2
    There are 2 ways of throwing a total of 3
    There are 3 ways of throwing a total of 4
    There are 4 ways of throwing a total of 5
    There are 5 ways of throwing a total of 6
    There are 6 ways of throwing a total of 7
    There are 5 ways of throwing a total of 8
    There are 4 ways of throwing a total of 9
    There are 3 ways of throwing a total of 10
    There are 2 ways of throwing a total of 11
    There is 1 way of throwing a total of 12

  3. Chandoo says:

    @alpha bravo ... welcome... 🙂

    either your comment or your dice is loaded 😉

    I am afraid the distribution shown in the right graph is what you get when you throw a pair of dice in real world. As Karl already explained, it is not random behavior you see when you try to combine 2 random events (individual dice throws), but more of order due to how things work.

    @Karl, thanks 🙂

  4. Jon Peltier says:

    When simulating a coin toss, the ROUND function you used is appropriate. However, your die simulation formula should use INT instead of ROUND:

    =INT(RAND()*6)+1

    Otherwise, the rounding causes half of each number's predictions to be applied to the next higher number. Also, you'd get a count for 7, which isn't possible in a die.

    To illustrate, I set up 1200 trials of each formula in a worksheet and counted the results. The image here shows the table and a histogram of results:

    http://peltiertech.com/WordPress/wp-content/img200808/RandonDieTrials.png

  5. Chandoo says:

    @Jon: thanks for pointing this out. You are absolutely right. INT() is what I should I have used instead of ROUND() as it reduces the possibility of having either 1 or 6 by almost half that of having other numbers.

    this is such a good thing to learn, helps me a lot in my future simulations.

    Btw, the actual graphs I have shown were plotted based on randbetween() and not from rand()*6, so they still hold good.

    Updating the post to include your comments as it helps everyone to know this.

  6. Jon Peltier says:

    By the way, the distribution is not a Gaussian distribution, as Karl points out. However, when you add the simulations of many dice together (i.e., ten throws), the overall results will approximate a Gaussian distribution. If my feeble memory serves me, this is the Central Limit Theorem.

  7. Chandoo says:

    @Jon, that is right, you have to nearly throw infinite number of dice and add their face counts to get a perfect bell curve or Gaussian distribution, but as the central limit theorem suggests, our curve should roughly look like a bell curve... 🙂

  8. [...] posts on games & excel that you may enjoy: Simulating Dice throws in Excel Generate and Print Bingo / Housie tickets using this excel Understanding Monopoly Board [...]

  9. YourFifthGradeMathsTeacher says:

    I'm afraid to say that this is a badly stated and ambiguous post, which is likely to cause errors and misunderstanding.
    Aside from the initial use of round() instead of int(),.. (you've since corrected), you made several crucial mistakes by not accurately and unambiguously stating the details.

    Firstly, you said:
    "this little function generates a random fraction between 0 and 1"
    Correctly stated this should be:
    "this little function generates a random fraction F where 0 <= F < 1".

    Secondly, I guess because you were a little fuzzy about the exact range of values returned by rand(), you have then been just as ambiguous in stating:
    "I usually write int(rand()*12)+1 if I need a random number between 0 to 12".
    (that implies 13 integers, not 12)

    Your formula, does not return 13 integers between 0 to 12.
    It returns 12 integers between 1 and 12 (inclusive).
    -- As rand() returns a random fraction F where 0 <= F < 1, you can obviously can only get integers between 1 and 12 (inclusive) from your formula as stated above, but clearly not zero.

    If you had said either:
    "I usually write int(rand()*12) if I need a random number between 0 to 11 (inclusive)",
    or:
    "I usually write int(rand()*12)+1 if I need a random number between 1 to 12 (inclusive)"
    then you would have been correct.

    Unfortunately, you FAIL! -- repeat 5th grade please!

    Your Fifth Grade Maths Teacher

  10. Justin says:

    Idk if I'm on the right forum for this or how soon one can reply, but I'm working on a test using Excel and I have a table set up to get all my answers from BUT I need to generate 10,000 answers from this one table. Every time, I try to do this I get 10,000 duplicate answers. I know there has to be some simple command I have left out or not used at all, any help would be extremely helpful! (And I already have the dice figured out lol)

    Roll 4Dice with 20Sides (4D20) if the total < 20 add the sum of a rerolled 2D20. What is the average total over 10,000 turns? (Short and sweet)

    Like I said when I try to simulate 10,000turns I just get "67" 10,000times -_- help please! 😀

  11. Hui... says:

    @Justin

    This is a good example to use for basic simulation

    have a look at the file I have posted at:
    https://rapidshare.com/files/1257689536/4_Dice.xlsx

    It uses a variable size dice which you set
    Has 4 Dice
    Throws them 10,000 times
    If Total per roll < 20 uses the sum of 2 extra dice Adds up the scores Averages the results You can read more about how it was constructed by reading this post: http://chandoo.org/wp/2010/05/06/data-tables-monte-carlo-simulations-in-excel-a-comprehensive-guide/

  12. SpreadSheetNinja says:

    Oh derp, i fell for this trap too, thinking i was makeing a good dice roll simulation.. instead of just got an average of everything 😛

    Noteably This dice trow simulate page is kinda important, as most roleplay dice games were hard.. i mean, a crit failure or crit hit (rolling double 1's or double 6's) in a a game for example dungeons and dragons, if you dont do the roll each induvidual dice, then theres a higher chance of scoreing a crit hit or a crit failure on attacking..

  13. Freswinn says:

    I've been working on this for awhile. So here's a few issues I've come across and solved.

    #1. round() does work, but you add 0.5 as the constant, not 1.

    trunc() and int() give you the same distributions as round() when you use the constant 1, so among the three functions they are all equally fair as long as you remember what you're doing when you use one rather than the other. I've proven it with a rough mathematical proof -- I say rough only because I'm not a proper mathematician.

    In short, depending on the function (s is the number of sides, and R stands in for RAND() ):

    round(f), where f = sR + 0.5
    trunc(f), where f = sR + 1
    int(f), where f = sR + 1

    will all give you the same distribution, meaning that between the three functions they are fair and none favors something more than the others. However...

    #2. None of the above gets you around the uneven distribution of possible outcomes of primes not found in the factorization of the base being used (base-10, since we're using decimal; and the prime factorization of 10 is 2 and 5).

    With a 10-sided die, where your equation would be
    =ROUND(6*RAND()+0.5)
    Your distribution of possible values is even across all ten possibilities.
    However, if you use the most basic die, a 6-sided die, the distributions favor some rolls over others. Let's assume your random number can only generate down to the thousandths (0.000 ? R ? 0.999). The distribution of possible outcomes of your function are:
    1: 167
    2: 167
    3: 166
    4: 167
    5: 167
    6: 166

    So 4 and 6 are always under-represented in the distribution by 1 less than their compatriots. This is true no matter how many decimals you allow, though the distribution gets closer and closer to equal the further towards infinite decimal places you go.
    This carries over to all die whose numbers of sides do not factor down to a prime factorization of some exponential values of 2 and 5.

    So, then, how can we fix this one, tiny issue in a practical manner that doesn't make our heads hurt or put unnecessary strain on the computer?

  14. Freswinn says:

    Real quick addendum to the above:
    Obviously when I put the equation after the example of the 10-sided die, I meant to put a 10*RAND() instead of a 6*RAND(). Oops!

    Also, where I have 0.000 ? R ? 0.999, the ?'s are supposed to be less-than-or-equal-to signs but the comments didn't like that. Oh well.

  15. Andrew says:

    How do you keep adding up the total? I would like to have a cell which keeps adding up the total sum of the two dices, even after a new number is generated in the cells when you refresh or generate new numbers.

  16. kk says:

    So, how do you simulate rolling 12 dice? Do you write int(rand()*6) 12 times?

    Is there a simpler way of simulating n dice in Excel?

  17. Mohammed Ali says:

    I've run this code in VBA

    Sub generate()
    Application.ScreenUpdating = False
    Application.Calculation = False
    Dim app, i As Long
    Set app = Application.WorksheetFunction

    For i = 3 To 10002
    Cells(i, 3).Value = i - 2
    Cells(i, 4).Value = app.RandBetween(2, 12)
    Cells(i, 5).Value = app.RandBetween(1, 6) + app.RandBetween(1, 6)
    Next
    Application.ScreenUpdating = True
    Application.Calculation = True
    End Sub

    But I get the same distribution for both columns 4 and 5
    Why ?

Leave a Reply