Dummy Data – How to use the Random Functions

Posted on May 4th, 2011 in excel apps , Excel Howtos , Huis , Learn Excel , Posts by Hui - 50 comments

Dummy Data – How to use the Random Functions

Using collected or known data is the best when developing Excel models, but from time to time this may not be available when you are developing your model.

This post will look at some options for setting up Dummy Data using Excels Random functions.

 

Variability

Real data displays a range of variability, but this variability is generally within ranges or distributions of ranges of results.

All fields type can contain variability

ie: Country, State Names and Zip/Postal Codes, Maybe large lists but are fixed

Peoples Names, Maybe a large lists but are fixed by local rules

Ages, generally less than 80, never less than 0

Dates: Rarely before 1990 or 1900 in rare cases

Lists: are fixed

Numbers: generally random or conforming to a fixed distribution or known trend

Numbers: may include integers, decimals, negatives, extremely large numbers or all combinations

In generating random lists you will need to choose if you want random data, random data within constraints or random with a distribution. The choice is really yours and should in part be based on what the data is being used for and how accurately it needs to reflect reality.

 

Techniques

The techniques described below are all shown with a worked example in the attached Examples File or the Excel 2003 Example

 

Each example is annotated below like (Example 4.). ie: Refer to Example 4 in the above example files.

Dates

Setting up Random Dates is a simple process using the Date function.

=Randbetween(StartDate,EndDate)

 

Dates in a Range of Years

=Randbetween(Date(2000,1,1),Date(2011,12,31))

Will give a list of Random dates between 1 Jan 2000 and 31 Dec 2011 (Example 1.)

(Thanx Mike W)

Dates in a Month

=Date(2010, 6, Randbetween(1,30)

Will give a list of Random dates between 1 June 2010 and 30 June 2010 (Example 2.)

 

Don’t worry that the above formula (Example 1) can actually produce a 31 Feb 2005, the Date function will happily convert that to 3 March 2005 (Example 3.)

 

Dates within a Date Distribution

=DATE(2011,7,NORMINV(RAND(), 0,60))

Will give a list of Random dates between approximately 1 Jan 2010 and 31 Dec 2010, with a mean of July 1 and standard deviation of 2 Months (60days) (Example 4.)

Where NORMINV(RAND(), 0,60) will return values between -180 and +180, 99.7% of the time

 

Text Fields

Dependant on how many items in the list you require there are 3 techniques available

Choose

For small lists of less than 6 to 10 items you can use a simple Choose function (Example 5.)

=Choose(Randbetween(1,6),”Item 1″, “Item 2”, “Item 3”, “Item 4”, “Item 5”, “Item 6”)

VLookup

Using VLookup (Example 6.)

=Vlookup(Randbetween(1,List Length), List, 2)

Index

Using Index (Example 7.)

=Index(List, Randbetween(1, Counta(List) ))

Numbers

Small Random List of Numbers

Random from a small list of numbers (Example 8.)

=Choose(Randbetween(1,6), Numb 1, Numb 2, Numb 3, Numb 4, Numb 5, Numb 6 )

Note that the numbers:

  • Don’t have to be in any order,
  • Can be integers, negatives or contain decimals
  • Can be repeated

eg: =Choose(Randbetween(1,6), 18, 21, -19, 36.4, 18, 24)

 

Random Integers

Return Integers between Start and Finish (Example 9.)

=Randbetween(Start, Finish)

=Randbetween(50, 100)

Will return an Integer between 50 and 100

Random Numbers

=Rand()

Will return a random number between 0 and 1

=Round(Rand()*100, 2)

Will Return Numbers between 0 and 100 with 2 Decimal places (Example 10.)

Random Numbers Based on a Distribution

=Norminv(Rand(), Mean, SD)

Will return a random number between 0 and 1 based on a distribution of Average = Mean and Standard Deviation = SD

=Norminv(Rand(), 50, 17)

Will return a random number between 0 and 100 based on a distribution of Average = 50 and Standard Deviation = 17, (Example 11.)

Random Numbers Fitting a Trend

If your distribution has to match a trend add a Random component to the Trends equation (Example 12.)

Y=mX+c

= rand() * X + rand()*5

= rand() * A2 + rand()*5

 

True/False

Choose

Use Choose and Randbetween (Example 13.)

=Choose(Randbetween(1,2), True, False)

If

Use If and Rand (Example 14.)

=If(Rand()<0.5, True, False)

 

Combination Text and Numbers

The above techniques can be combined to make lists of Alpha Numeric Data

Say your business has a fleet of vehicles (TR=Truck, VN=Van, CAR=Car)

=Choose(Randbetween(1,3),”TR”,”VN”,”CAR”) & Text(Randbetween(1,15),”0#”)

Will randomly choose 1 of “TR”,”VN”,”CAR” and add a random number between 1 and 15 to it format with a leading 0, eg: TR05, (Example 15.)

 

Other Sources of Data

Random Data

There are a number of web sites where Random Data is available.

http://www.fakenamegenerator.com/order.php

http://www.generatedata.com/#generator

http://www.melissadata.com/lookups/

Open Source Data

There are a number of web sites where Open Source Data is available.

http://en.wikipedia.org/

http://www.google.com/

http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php

 

Function Used:

Rand: Returns a random number between 0 and 1.

Randbetween: Returns a random Integer between lower and upper limits. Pre Excel 2007 Randbetween was only available through installation of the Analysis Toolpak (Thanx Luke).

Norminv: Returns the inverse of the normal cumulative distribution. That is it returns the X value from a Normal Distribution that has a know Mean and Standard Deviation where the a known cumulative percentage is supplied.

Choose: Choose an item from a list of up to 254 items.

Vlookup: Lookup the matching value from a list and return a data item from another column from the same location.

Index: Retrieve an items from a defined location within a range.

Text: Displays a number as Text with a defined format.

 

Other Uses of Random Functions

Of course the techniques shown here don’t have to be used for setting up Dummy Data.

One area where Random numbers is used is in Monte Carlo Simulation. This has been discussed at Chandoo.org at Data Tables and Monte-Carlo Simulations in Excel a Comprehensive Guide

 

Techniques

The techniques described above are all shown with a worked example in the attached Examples File or the Examples File 2003 ver

 

Limitations in Pre Excel 2007 versions

The Excel function, Randbetween, was only introduced in Excel 2007. As such the exaples above will only work in 2007/10.

However a simple alternative is available

Randbetween(Low, High) = Low + Int(Rand()*(High-Low))+1

Randbetween(90, 100) = 90 + Int(Rand()*10)+1

Examples using this approach are shown in the 2003 Version of the Examples files above.

 

How have you made Dummy Data or used the Random Functions?

How have you made Dummy Data or How have you used it ?

How have you used Random Numbers in your workbooks ?

Let us know in the comments below:

Written by Hui...
Tags: , , , , , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

50 Responses to “Dummy Data – How to use the Random Functions”

  1. Great post, Chandoo - I often need to create weighted/distributed data and always seem to spend ages doing it. I never knew you could do the normal distribution stuff with random numbers. Cool!

  2. Finnur says:

    Excellent post! Thank you. Especially for Random Numbers Based on a Distribution. Beautiful.

  3. I really like this, with one glaring exception: "Dates in a Range of Years" will double-count days at the start of months following months with fewer than 31 days. So March 1st to 3rd will be counted twice: once correctly, once for February "29/30/31".

    It's easily fixed - and actually it's more flexible - since Excel stores dates as numbers, just call RANDBETWEEN on the date range you want, so the example given becomes:

    =RANDBETWEEN(DATE(2000,1,1),DATE(2011,12,31))

  4. Luke M says:

    Thanks for all the different methods Chandoo!
    One caveat, RANDBETWEEN is available in 2003, if you install the Analysis ToolPak add-in.

  5. mikev says:

    I use random for Monte Carlo simulations for performance metrics I track at work. For example we want our energy consumption metric to decrease by 1,000 BTU this year. If, so far, the month over month changes have been between -150 and +50 then I use a =randbetween(-150,50) for thousands of iterations for the remaining months of the year. I then sum the random values for the remaining months and use the frequency function to find what percent of the time (probability) that we get to the 1,000 BTU decrease. Very useful.

  6. SteveT says:

    Great post. I also like force some data values as Random ones may not produce that desired result with the random fuctions.

    For example, I like to force a set of the minimums and the maximums especially if I am randomizing 2 variables per category. Otherwise you may get the minimum on one of the random variables but never get another minimum inconjunction with that category..

  7. Hui... says:

    @All
    Thank you for the kind comments,
    They are appreciated
    .
    @MikeV, SteveT
    I have discussed Monte Carlo simulations in: http://chandoo.org/wp/2010/05/06/data-tables-monte-carlo-simulations-in-excel-a-comprehensive-guide/
    .
    When forcing values you may be better off to use a Data Table, and setup the values you want in that. It no longer becomes random , but the data Table gives you a lot more control over which values get used and how often. Data Tables for this purpose are also discussed in the same link above.
    .
    @LukeM, I will update the post accordingly

  8. Kurt Proegler says:

    Wonderful information. Perfect timing, too - as I just started developing some report mockups for which no real data exists, but for which real-looking vs arbitrary data is needed. This can be quite cumbersome, some of these techniques will help.

  9. PSG says:

    Awesome Sir!!

    These tricks will surely help all of us to create data to play with....

    Learning new things everyday... Thank a lot!!

  10. Fred says:

    Thanks! Neat tricks. Just one question. How do I get a random number from various ranges of data?
    Say, an employer want to do a gift lottery for eligible employee whose ID number is between 4 and 40, 110 and 125, 700 and 1000 due to the department they belong to. How should I set up a truly random for these people??

  11. Hui... says:

    @Fred

    =CHOOSE(IF(RANDBETWEEN(0,1000)< =103,1, IF(RANDBETWEEN(103,1000)<=145,2,3)),RANDBETWEEN(4,40), RANDBETWEEN(110,125),RANDBETWEEN(700,1000))
    .
    That should all be 1 line

  12. Jon says:

    To echo Mike Woodhouse, the Date method in the post is somewhat limited, as it breaks if either the month or the day of the 2nd date is less than those of the first date. So, for instance, it would be impossible to produce a random set of dates for the two weeks between Dec 25 and January 7. Mike's method would allow that.

  13. Hui... says:

    @Mike, Jon
    Thanx for your input
    I have updated the post and example files accordingly

  14. Prem Beejan says:

    Thanks a lot
    Had never thought of the power of Rand. The links are very useful for dummy data
    Prem

  15. juanito says:

    Thanks for this post, Hui: very useful in itself, and also as an example of lateral thinking with excel functions.
    @Fred: another way (easier to understand and slightly more accurate, though less elegant that Hui's) round your problem would be to place all the available IDs in one column, then do an INDEX on it, using RANDBETWEEN 1 and the total number of available IDs (which you can automate with the COUNTA function) as the selection argument

  16. vg-excel-gest says:

    Uau!!, good post I really needed to generate random information for the demonstration of my projects will be very useful.

    VG-EXCEL-GEST
    vg-excel-gest.blogspot.com/
    http://www.youtube.com/user/VGEXCELGEST2?feature=mhum
    http://www.facebook.com/vgexcelgest

  17. fred says:

    @ Hui Thank you very much!

    But I don't understand the part where the random () <103, <145? What do they do?

  18. Hui... says:

    @Fred
    There are 36 Values, between 4 & 40, 15 between 110 & 125 and 300 between 700 and 1000.
    Thats a total of 351 values
    36/351 = 10.3%
    15/351 = 4.2%
    300/351 = 85.5%
    .
    as Randbetween only returns integers
    To reduce errors I multiplied those values by 10
    36/351 = 10.3% x 10 = 103
    15/351 = 4.2% x 10 = 42
    300/351 = 85.5% = 855
    .
    and cummulatively is
    36/351 = 103
    15/351 = 42 + 103 = 145
    300/351 = 145+ 855 = 1000
    .
    So now I can do a Random Between 1 and 1000 and every values has an equal chance of being picked
    .
    =CHOOSE(IF(RANDBETWEEN(0,1000)< =103,1, IF(RANDBETWEEN(103,1000)<=145,2,3)),RANDBETWEEN(4,40), RANDBETWEEN(110,125),RANDBETWEEN(700,1000)) . If it is less than 103 Choose Option 1 which is RANDBETWEEN(4,40) If its > 103 choose another random number between 103 and 1000
    If its < =145 Choose Option 2 which is a random RANDBETWEEN(110,125) otherwise Choose Option 3 which is a value RANDBETWEEN(700,1000) . To check I made up a small spreadsheet which compares the above technique to doing a simple =CHOOSE(RANDBETWEEN(1,3), RANDBETWEEN(4,40), RANDBETWEEN(110,125), RANDBETWEEN(700,1000)) and plotted the results over 10,000 iterations You can see the results here: https://rapidshare.com/files/460825479/Freds_Question.xlsx

  19. DQKennard says:

    I needed to scramble some data to de-identify the individuals, but keep records that were plausible and "real looking". For names, I grabbed a big list of first and last names from the Census website somewhere, and used a random index to grab random lines from the names lists. (I made no attempt to avoid weird ethnic combinations, but that was OK.) Then I set up a column in the source data with RAND() and a column that did a RANK on the random numbers. In my destination spreadsheet, I used lookups to grab the data on the line with the rank equal to my current line, e.g. on line 4 of the destination sheet, I grabbed the line of source data that was randomly ranked as 4. A few additional columns used random choices among a set of possible assignments, similar to the choose examples above. Finally, I cut the data set approximately in half by randomly assigning a 1 or 2 to each line and tossing all the 2s.

    I really like the examples of using randoms with the statistical functions.

  20. Karthik says:

    good post...

  21. Pedro Wave says:

    I have made Dummy Data calculations with my own Excel Touch Calculator.
    If you want to try, follow my blog.
    Random values are generated by pressing A key on the physical keyboard or Rnd virtual key button calling a macro:
    miCalc.Value = Round(Fix(tuCalc) * Rnd(), 0)
    I did it because the Windows calculator doesn't have random key.

  22. fred says:

    @ Hui

    WOW!!! Thank you very much! It's really eye opening. O.O

  23. Brian says:

    These tools can be very helpful -- nicely described. I appreciate the clear layout. I try to create these on the fly from time to time and always get bogged down. Just two cautions to offer:

    1) I suspect your random dates are not randomly distributed if Feb. 31 converts to March 3. That makes March 3 too likely....

    2) People should always be careful how they use this stuff -- they are PSEUDO-random (computers are rarely actually random) and can produce unintended results. More importantly, people pick distributions based on what they think should apply, but distribution selections easily can be really wrong. Nobel laureates (otherwise smart people) famously helped LTCM misjudge the tail probabilities for unlikely economic outcomes and lost billions.

  24. Barb says:

    How do I create a random date formula that tosses out the date once it has been selected?

    Great site very informative.

  25. Kern says:

    Thank you for this! Very useful.

  26. [...] thru Using Excel’s Random Functions for a detailed overview these [...]

  27. Rana says:

    Hi Hui,

    I have a deadline to meet and struggling with the code below.
    All I want is to pasteformat&formula in the worksheet that is generated from this code. At the moment it only copies the format.:(
    Have been trying to crake it all weekend.

    I tried a lot but cant get around 'Activesheet.paste'

    Please help.
    --------------------------------
    Sub Macro1()

    ' This macro takes values in the range myList
    ' and breaks it in to multiple lists
    ' and saves them to separate files.
    Dim cell As Range
    Dim curPath As String
    curPath = ActiveWorkbook.Path & "\"
    Application.ScreenUpdating = False
    Application.DisplayAlerts = False
    For Each cell In Range("bizu")
    [BU] = cell.Value
    Range("myList1").AdvancedFilter Action:=xlFilterCopy, _
    Criteriarange:=Range("Criteria"), copyToRange:=Range("bizextract"), unique:=False
    Range(Range("bizExtract"), Range("bizExtract").End(xlDown)).Copy
    Workbooks.Add
    ActiveSheet.Paste
    ActiveWorkbook.SaveAs Filename:=curPath & cell.Value & Format(Now, "dmmmyyyy-hhmmss") & ".xlsx", _
    FileFormat:=xlOpenXMLWorkbook, CreateBackup:=False
    ActiveWindow.Close
    Range(Range("bizExtract"), Range("bizExtract").End(xlDown)).ClearContents
    Next cell

    Application.ScreenUpdating = True
    Application.DisplayAlerts = True

    End Sub

  28. ron says:

    hi I need to create a dummy data with two sets of data. each has 400 sets of 9 random numbers each (1-5) . first mean is calculated for each set and then the two sets of means are correlated. I want the correlation, overall mean (of all 800 values) and over std dev to be a praticular value..can this be achieved in excel?

  29. ron says:

    something like

    A t1 3 3 4 3 5 3 5 4 2
    t2 5 5 4 4 4 5 3 4 1
    B t1 4 5 5 5 5 4 5 3 2
    t2 5 4 5 5 3 5 5 4 3
    C t1 5 5 4 3 4 4 5 5 4
    t2 3 4 4 4 3 4 5 2 4

    and so on
    so this makes 400 sets of data(A :t1,t2; B:t1,t2...) where each row average is calculated and then a correlation is found between all t1 averages and and all t2 averages which should be a particular value and the overall average (of all 800 means) should be a particular value with a specific std dev. Possible?

  30. ron says:

    have mailed a sample xls to you..

  31. ron says:

    any luck? sorry to pester....am in quite a soup and need to work this out asap...elsewill have to manually fill it up....egack!

  32. Xolani says:

    I need help...I want to create 1000 random numbers between 0 and 1 but the sum must not exceed 400.......please help me.

    • Hui... says:

      @Xolani
      I would put =Rand() into say A1:A1000
      then in B1 =Sum(A1:A1000)
      then simply press F9 repeatedly
      once the cell B1 shows less than 400
      Copy A1:A100
      Paste as values somewhere else

      I ran this for 1 million iterations and didn't get a value under 400
      I used the following code
      Sub h()
      Dim x As Long
      x = 1
      Do While ["B1"] > 400
      Calculate
      x = x + 1
      Debug.Print x
      Loop
      End Sub

  33. Xolani says:

    @Hui...
    Thank you, but I am saying I need to generate random numbers (only 0 and 1) 1000 times, but the sum of the 1's must not exceed 400.
    eg
    0 1 1 0 0 1 1 0 1 0 0 1 1 1 1 0 0 1 1 0 0 0.......etc.
    but the sum of the 1's must not exceed 400

  34. ahmed says:

    Hi please help me, how to generate 2500 random numbers from (0,1000) and the random numbers must have mean of 83 and standard deviation of 93?
    I did it but im getting negative numbers also...please advice me! thanks alot

    • Hui... says:

      @Ahmed
      If you have a mean of 83 and SD of 93
      You would expect +/- 3SD to be between -196 and 362

      • ahmed says:

        thanks alot for replying....i got - and + positive random numbers, is that ok? because it is saying from (0,1000) ? shouldnt it be only positive? i am drowning with projects so thanks for helping me...may God bless u 🙂 😀

        • Hui... says:

          @Ahmed
          If you have a mean of 83 and SD of 93
          Excel will return a normal distribution +/- 6SD to be between -475 and 641

          If you want you can truncate the negative values to 0, but the chances of then getting an average of 83 is very remote

          Are you sure your SD is 93 and not 9.3 ?

          If you choose random numbers between 0 and 1000 the average should be ~500

          So I not really sure you understand what you are asking

          Can you please clarify?

        • Hui... says:

          =Randbetween(0,1000) will only return Integers between 0 and 1000

          If you want decimals between 0 and 1000 I'd use
          =Round(Randbetween(0,1000*100)/100,2) which will return decimals with 2 digits between 0 and 100

  35. ahmed says:

    and also i did it using a software which is called datarun...can u please help me by telling me how to write that in Excel using randbetween function?

  36. ahmed says:

    i got a maximum positive number of 450 and negative number of - 263.5

  37. YM says:

    I'm a beginner in excel and would like to ask how do I generate 365 Random numbers (0 to 5) in Cell A1 to A365 that will sum to a specific value (1477)? And to be able to re-generate another set of random number when i hit F9? Appreciate the help! Randbetween doesnt seems to solve this problem.

    • Hui... says:

      @YM
      The average of the numbers 0 to 5 is 2.5 (15/6)
      so 365*2.5= 912

      To get a total of 1477 your average needs to be 4.05 (1477/365)
      so the numbers won't be random if you force that

      I assume you only want to use Integers

      In that case try something like: =RANDBETWEEN(3,5)
      copy down 365 times
      Then add a subtotal somewhere
      then press F9 until the total is 1477

  38. Dan says:

    Hi! I'm a newbie. Trying to randomly pick 3 names a percentage of the times. 1st name 66% of the time, 2nd name 17% of the time, and 3rd name 17% of the time. There are 30 cells I am doing this in.

  39. Sarbjeet Singh says:

    I want to data set of 1000 random numbers with 2 variable with 0 correlation

Leave a Reply