Dummy Data – How to use the Random Functions
Using collected or known data is the best when developing Excel models, but from time to time this may not be available when you are developing your model.
This post will look at some options for setting up Dummy Data using Excels Random functions.
Variability
Real data displays a range of variability, but this variability is generally within ranges or distributions of ranges of results.
All fields type can contain variability
ie: Country, State Names and Zip/Postal Codes, Maybe large lists but are fixed
Peoples Names, Maybe a large lists but are fixed by local rules
Ages, generally less than 80, never less than 0
Dates: Rarely before 1990 or 1900 in rare cases
Lists: are fixed
Numbers: generally random or conforming to a fixed distribution or known trend
Numbers: may include integers, decimals, negatives, extremely large numbers or all combinations
In generating random lists you will need to choose if you want random data, random data within constraints or random with a distribution. The choice is really yours and should in part be based on what the data is being used for and how accurately it needs to reflect reality.
Techniques
The techniques described below are all shown with a worked example in the attached Examples File or the Excel 2003 Example
Each example is annotated below like (Example 4.). ie: Refer to Example 4 in the above example files.
Dates
Setting up Random Dates is a simple process using the Date function.
=Randbetween(StartDate,EndDate)
Dates in a Range of Years
=Randbetween(Date(2000,1,1),Date(2011,12,31))
Will give a list of Random dates between 1 Jan 2000 and 31 Dec 2011 (Example 1.)
(Thanx Mike W)
Dates in a Month
=Date(2010, 6, Randbetween(1,30)
Will give a list of Random dates between 1 June 2010 and 30 June 2010 (Example 2.)
Don’t worry that the above formula (Example 1) can actually produce a 31 Feb 2005, the Date function will happily convert that to 3 March 2005 (Example 3.)
Dates within a Date Distribution
=DATE(2011,7,NORMINV(RAND(), 0,60))
Will give a list of Random dates between approximately 1 Jan 2010 and 31 Dec 2010, with a mean of July 1 and standard deviation of 2 Months (60days) (Example 4.)
Where NORMINV(RAND(), 0,60) will return values between -180 and +180, 99.7% of the time
Text Fields
Dependant on how many items in the list you require there are 3 techniques available
Choose
For small lists of less than 6 to 10 items you can use a simple Choose function (Example 5.)
=Choose(Randbetween(1,6),”Item 1″, “Item 2”, “Item 3”, “Item 4”, “Item 5”, “Item 6”)
VLookup
Using VLookup (Example 6.)
=Vlookup(Randbetween(1,List Length), List, 2)
Index
Using Index (Example 7.)
=Index(List, Randbetween(1, Counta(List) ))
Numbers
Small Random List of Numbers
Random from a small list of numbers (Example 8.)
=Choose(Randbetween(1,6), Numb 1, Numb 2, Numb 3, Numb 4, Numb 5, Numb 6 )
Note that the numbers:
- Don’t have to be in any order,
- Can be integers, negatives or contain decimals
- Can be repeated
eg: =Choose(Randbetween(1,6), 18, 21, -19, 36.4, 18, 24)
Random Integers
Return Integers between Start and Finish (Example 9.)
=Randbetween(Start, Finish)
=Randbetween(50, 100)
Will return an Integer between 50 and 100
Random Numbers
=Rand()
Will return a random number between 0 and 1
=Round(Rand()*100, 2)
Will Return Numbers between 0 and 100 with 2 Decimal places (Example 10.)
Random Numbers Based on a Distribution
=Norminv(Rand(), Mean, SD)
Will return a random number between 0 and 1 based on a distribution of Average = Mean and Standard Deviation = SD
=Norminv(Rand(), 50, 17)
Will return a random number between 0 and 100 based on a distribution of Average = 50 and Standard Deviation = 17, (Example 11.)
Random Numbers Fitting a Trend
If your distribution has to match a trend add a Random component to the Trends equation (Example 12.)
Y=mX+c
= rand() * X + rand()*5
= rand() * A2 + rand()*5
True/False
Choose
Use Choose and Randbetween (Example 13.)
=Choose(Randbetween(1,2), True, False)
If
Use If and Rand (Example 14.)
=If(Rand()<0.5, True, False)
Combination Text and Numbers
The above techniques can be combined to make lists of Alpha Numeric Data
Say your business has a fleet of vehicles (TR=Truck, VN=Van, CAR=Car)
=Choose(Randbetween(1,3),”TR”,”VN”,”CAR”) & Text(Randbetween(1,15),”0#”)
Will randomly choose 1 of “TR”,”VN”,”CAR” and add a random number between 1 and 15 to it format with a leading 0, eg: TR05, (Example 15.)
Other Sources of Data
Random Data
There are a number of web sites where Random Data is available.
http://www.fakenamegenerator.com/order.php
http://www.generatedata.com/#generator
http://www.melissadata.com/lookups/
Open Source Data
There are a number of web sites where Open Source Data is available.
http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php
Function Used:
Rand: Returns a random number between 0 and 1.
Randbetween: Returns a random Integer between lower and upper limits. Pre Excel 2007 Randbetween was only available through installation of the Analysis Toolpak (Thanx Luke).
Norminv: Returns the inverse of the normal cumulative distribution. That is it returns the X value from a Normal Distribution that has a know Mean and Standard Deviation where the a known cumulative percentage is supplied.
Choose: Choose an item from a list of up to 254 items.
Vlookup: Lookup the matching value from a list and return a data item from another column from the same location.
Index: Retrieve an items from a defined location within a range.
Text: Displays a number as Text with a defined format.
Other Uses of Random Functions
Of course the techniques shown here don’t have to be used for setting up Dummy Data.
One area where Random numbers is used is in Monte Carlo Simulation. This has been discussed at Chandoo.org at Data Tables and Monte-Carlo Simulations in Excel a Comprehensive Guide
Techniques
The techniques described above are all shown with a worked example in the attached Examples File or the Examples File 2003 ver
Limitations in Pre Excel 2007 versions
The Excel function, Randbetween, was only introduced in Excel 2007. As such the exaples above will only work in 2007/10.
However a simple alternative is available
Randbetween(Low, High) = Low + Int(Rand()*(High-Low))+1
Randbetween(90, 100) = 90 + Int(Rand()*10)+1
Examples using this approach are shown in the 2003 Version of the Examples files above.
How have you made Dummy Data or used the Random Functions?
How have you made Dummy Data or How have you used it ?
How have you used Random Numbers in your workbooks ?
Let us know in the comments below:
53 Responses to “Dummy Data – How to use the Random Functions”
Great post, Chandoo - I often need to create weighted/distributed data and always seem to spend ages doing it. I never knew you could do the normal distribution stuff with random numbers. Cool!
Excellent post! Thank you. Especially for Random Numbers Based on a Distribution. Beautiful.
I really like this, with one glaring exception: "Dates in a Range of Years" will double-count days at the start of months following months with fewer than 31 days. So March 1st to 3rd will be counted twice: once correctly, once for February "29/30/31".
It's easily fixed - and actually it's more flexible - since Excel stores dates as numbers, just call RANDBETWEEN on the date range you want, so the example given becomes:
=RANDBETWEEN(DATE(2000,1,1),DATE(2011,12,31))
Thanks for all the different methods Chandoo!
One caveat, RANDBETWEEN is available in 2003, if you install the Analysis ToolPak add-in.
I use random for Monte Carlo simulations for performance metrics I track at work. For example we want our energy consumption metric to decrease by 1,000 BTU this year. If, so far, the month over month changes have been between -150 and +50 then I use a =randbetween(-150,50) for thousands of iterations for the remaining months of the year. I then sum the random values for the remaining months and use the frequency function to find what percent of the time (probability) that we get to the 1,000 BTU decrease. Very useful.
Great post. I also like force some data values as Random ones may not produce that desired result with the random fuctions.
For example, I like to force a set of the minimums and the maximums especially if I am randomizing 2 variables per category. Otherwise you may get the minimum on one of the random variables but never get another minimum inconjunction with that category..
@All
Thank you for the kind comments,
They are appreciated
.
@MikeV, SteveT
I have discussed Monte Carlo simulations in: http://chandoo.org/wp/2010/05/06/data-tables-monte-carlo-simulations-in-excel-a-comprehensive-guide/
.
When forcing values you may be better off to use a Data Table, and setup the values you want in that. It no longer becomes random , but the data Table gives you a lot more control over which values get used and how often. Data Tables for this purpose are also discussed in the same link above.
.
@LukeM, I will update the post accordingly
Wonderful information. Perfect timing, too - as I just started developing some report mockups for which no real data exists, but for which real-looking vs arbitrary data is needed. This can be quite cumbersome, some of these techniques will help.
Awesome Sir!!
These tricks will surely help all of us to create data to play with....
Learning new things everyday... Thank a lot!!
Thanks! Neat tricks. Just one question. How do I get a random number from various ranges of data?
Say, an employer want to do a gift lottery for eligible employee whose ID number is between 4 and 40, 110 and 125, 700 and 1000 due to the department they belong to. How should I set up a truly random for these people??
@Fred
=CHOOSE(IF(RANDBETWEEN(0,1000)<=103,1, IF(RANDBETWEEN(103,1000)<=145,2,3)),RANDBETWEEN(4,40), RANDBETWEEN(110,125),RANDBETWEEN(700,1000))
.
That should all be 1 line
To echo Mike Woodhouse, the Date method in the post is somewhat limited, as it breaks if either the month or the day of the 2nd date is less than those of the first date. So, for instance, it would be impossible to produce a random set of dates for the two weeks between Dec 25 and January 7. Mike's method would allow that.
@Mike, Jon
Thanx for your input
I have updated the post and example files accordingly
Thanks a lot
Had never thought of the power of Rand. The links are very useful for dummy data
Prem
Thanks for this post, Hui: very useful in itself, and also as an example of lateral thinking with excel functions.
@Fred: another way (easier to understand and slightly more accurate, though less elegant that Hui's) round your problem would be to place all the available IDs in one column, then do an INDEX on it, using RANDBETWEEN 1 and the total number of available IDs (which you can automate with the COUNTA function) as the selection argument
Uau!!, good post I really needed to generate random information for the demonstration of my projects will be very useful.
VG-EXCEL-GEST
vg-excel-gest.blogspot.com/
http://www.youtube.com/user/VGEXCELGEST2?feature=mhum
http://www.facebook.com/vgexcelgest
@ Hui Thank you very much!
But I don't understand the part where the random () <103, <145? What do they do?
@Fred
There are 36 Values, between 4 & 40, 15 between 110 & 125 and 300 between 700 and 1000.
Thats a total of 351 values
36/351 = 10.3%
15/351 = 4.2%
300/351 = 85.5%
.
as Randbetween only returns integers
To reduce errors I multiplied those values by 10
36/351 = 10.3% x 10 = 103
15/351 = 4.2% x 10 = 42
300/351 = 85.5% = 855
.
and cummulatively is
36/351 = 103
15/351 = 42 + 103 = 145
300/351 = 145+ 855 = 1000
.
So now I can do a Random Between 1 and 1000 and every values has an equal chance of being picked
.
=CHOOSE(IF(RANDBETWEEN(0,1000)< =103,1, IF(RANDBETWEEN(103,1000)<=145,2,3)),RANDBETWEEN(4,40), RANDBETWEEN(110,125),RANDBETWEEN(700,1000)) . If it is less than 103 Choose Option 1 which is RANDBETWEEN(4,40) If its > 103 choose another random number between 103 and 1000
If its < =145 Choose Option 2 which is a random RANDBETWEEN(110,125) otherwise Choose Option 3 which is a value RANDBETWEEN(700,1000) . To check I made up a small spreadsheet which compares the above technique to doing a simple =CHOOSE(RANDBETWEEN(1,3), RANDBETWEEN(4,40), RANDBETWEEN(110,125), RANDBETWEEN(700,1000)) and plotted the results over 10,000 iterations You can see the results here: https://rapidshare.com/files/460825479/Freds_Question.xlsx
I needed to scramble some data to de-identify the individuals, but keep records that were plausible and "real looking". For names, I grabbed a big list of first and last names from the Census website somewhere, and used a random index to grab random lines from the names lists. (I made no attempt to avoid weird ethnic combinations, but that was OK.) Then I set up a column in the source data with RAND() and a column that did a RANK on the random numbers. In my destination spreadsheet, I used lookups to grab the data on the line with the rank equal to my current line, e.g. on line 4 of the destination sheet, I grabbed the line of source data that was randomly ranked as 4. A few additional columns used random choices among a set of possible assignments, similar to the choose examples above. Finally, I cut the data set approximately in half by randomly assigning a 1 or 2 to each line and tossing all the 2s.
I really like the examples of using randoms with the statistical functions.
good post...
I have made Dummy Data calculations with my own Excel Touch Calculator.
If you want to try, follow my blog.
Random values are generated by pressing A key on the physical keyboard or Rnd virtual key button calling a macro:
miCalc.Value = Round(Fix(tuCalc) * Rnd(), 0)
I did it because the Windows calculator doesn't have random key.
@ Hui
WOW!!! Thank you very much! It's really eye opening. O.O
These tools can be very helpful -- nicely described. I appreciate the clear layout. I try to create these on the fly from time to time and always get bogged down. Just two cautions to offer:
1) I suspect your random dates are not randomly distributed if Feb. 31 converts to March 3. That makes March 3 too likely....
2) People should always be careful how they use this stuff -- they are PSEUDO-random (computers are rarely actually random) and can produce unintended results. More importantly, people pick distributions based on what they think should apply, but distribution selections easily can be really wrong. Nobel laureates (otherwise smart people) famously helped LTCM misjudge the tail probabilities for unlikely economic outcomes and lost billions.
How do I create a random date formula that tosses out the date once it has been selected?
Great site very informative.
Thank you for this! Very useful.
[...] thru Using Excel’s Random Functions for a detailed overview these [...]
Hi Hui,
I have a deadline to meet and struggling with the code below.
All I want is to pasteformat&formula in the worksheet that is generated from this code. At the moment it only copies the format.:(
Have been trying to crake it all weekend.
I tried a lot but cant get around 'Activesheet.paste'
Please help.
--------------------------------
Sub Macro1()
' This macro takes values in the range myList
' and breaks it in to multiple lists
' and saves them to separate files.
Dim cell As Range
Dim curPath As String
curPath = ActiveWorkbook.Path & "\"
Application.ScreenUpdating = False
Application.DisplayAlerts = False
For Each cell In Range("bizu")
[BU] = cell.Value
Range("myList1").AdvancedFilter Action:=xlFilterCopy, _
Criteriarange:=Range("Criteria"), copyToRange:=Range("bizextract"), unique:=False
Range(Range("bizExtract"), Range("bizExtract").End(xlDown)).Copy
Workbooks.Add
ActiveSheet.Paste
ActiveWorkbook.SaveAs Filename:=curPath & cell.Value & Format(Now, "dmmmyyyy-hhmmss") & ".xlsx", _
FileFormat:=xlOpenXMLWorkbook, CreateBackup:=False
ActiveWindow.Close
Range(Range("bizExtract"), Range("bizExtract").End(xlDown)).ClearContents
Next cell
Application.ScreenUpdating = True
Application.DisplayAlerts = True
End Sub
@Rana
I have emailed you
hi I need to create a dummy data with two sets of data. each has 400 sets of 9 random numbers each (1-5) . first mean is calculated for each set and then the two sets of means are correlated. I want the correlation, overall mean (of all 800 values) and over std dev to be a praticular value..can this be achieved in excel?
@ron
@Ron
When you say 400 sets I assume you mean 400 cells
What does 9 random numbers 1-5 mean
something like
A t1 3 3 4 3 5 3 5 4 2
t2 5 5 4 4 4 5 3 4 1
B t1 4 5 5 5 5 4 5 3 2
t2 5 4 5 5 3 5 5 4 3
C t1 5 5 4 3 4 4 5 5 4
t2 3 4 4 4 3 4 5 2 4
and so on
so this makes 400 sets of data(A :t1,t2; B:t1,t2...) where each row average is calculated and then a correlation is found between all t1 averages and and all t2 averages which should be a particular value and the overall average (of all 800 means) should be a particular value with a specific std dev. Possible?
have mailed a sample xls to you..
any luck? sorry to pester....am in quite a soup and need to work this out asap...elsewill have to manually fill it up....egack!
I need help...I want to create 1000 random numbers between 0 and 1 but the sum must not exceed 400.......please help me.
@Xolani
I would put =Rand() into say A1:A1000
then in B1 =Sum(A1:A1000)
then simply press F9 repeatedly
once the cell B1 shows less than 400
Copy A1:A100
Paste as values somewhere else
I ran this for 1 million iterations and didn't get a value under 400
I used the following code
Sub h()
Dim x As Long
x = 1
Do While ["B1"] > 400
Calculate
x = x + 1
Debug.Print x
Loop
End Sub
@Hui...
Thank you, but I am saying I need to generate random numbers (only 0 and 1) 1000 times, but the sum of the 1's must not exceed 400.
eg
0 1 1 0 0 1 1 0 1 0 0 1 1 1 1 0 0 1 1 0 0 0.......etc.
but the sum of the 1's must not exceed 400
@Xolani
Use: =RANDBETWEEN(0,1)
Hi please help me, how to generate 2500 random numbers from (0,1000) and the random numbers must have mean of 83 and standard deviation of 93?
I did it but im getting negative numbers also...please advice me! thanks alot
@Ahmed
If you have a mean of 83 and SD of 93
You would expect +/- 3SD to be between -196 and 362
thanks alot for replying....i got - and + positive random numbers, is that ok? because it is saying from (0,1000) ? shouldnt it be only positive? i am drowning with projects so thanks for helping me...may God bless u 🙂 😀
@Ahmed
If you have a mean of 83 and SD of 93
Excel will return a normal distribution +/- 6SD to be between -475 and 641
If you want you can truncate the negative values to 0, but the chances of then getting an average of 83 is very remote
Are you sure your SD is 93 and not 9.3 ?
If you choose random numbers between 0 and 1000 the average should be ~500
So I not really sure you understand what you are asking
Can you please clarify?
=Randbetween(0,1000) will only return Integers between 0 and 1000
If you want decimals between 0 and 1000 I'd use
=Round(Randbetween(0,1000*100)/100,2) which will return decimals with 2 digits between 0 and 100
and also i did it using a software which is called datarun...can u please help me by telling me how to write that in Excel using randbetween function?
i got a maximum positive number of 450 and negative number of - 263.5
@Ahmed
Thats correct as Excel will use +/-6 SD as a range for a Normal Distribution
I'm a beginner in excel and would like to ask how do I generate 365 Random numbers (0 to 5) in Cell A1 to A365 that will sum to a specific value (1477)? And to be able to re-generate another set of random number when i hit F9? Appreciate the help! Randbetween doesnt seems to solve this problem.
@YM
The average of the numbers 0 to 5 is 2.5 (15/6)
so 365*2.5= 912
To get a total of 1477 your average needs to be 4.05 (1477/365)
so the numbers won't be random if you force that
I assume you only want to use Integers
In that case try something like: =RANDBETWEEN(3,5)
copy down 365 times
Then add a subtotal somewhere
then press F9 until the total is 1477
Hi! I'm a newbie. Trying to randomly pick 3 names a percentage of the times. 1st name 66% of the time, 2nd name 17% of the time, and 3rd name 17% of the time. There are 30 cells I am doing this in.
[…] Dummy Data – How to Use Random Functions […]
I want to data set of 1000 random numbers with 2 variable with 0 correlation
This is an extremely helpful post - thanks for sharing!
Hello Chandoo,
Great overview.
However, I have a specific case I haven't been able to find an answer online. Hope you can help me out on it.
The problem:
Is it possible to make a 'conditional dynamically reducing randbetween function' in VBA? For example, lets say I want to populate a list of 9 numbers in a range and I start off with the first cell of that range with randbetween(1,9). If the output is 5, then the second cell in a range should be a number from a set of random numbers from 1 to 9 excluding 5. And this itterativelly proceedes, thorough other cells, such that the last number in the range is actually the only left number from the list of 1 to 9 that was not used in previous 8 cells.
I hope it is clear enough.
Thanks!
@Milos
Have a read of https://www.exceltip.com/custom-functions/return-random-numbers-using-vba-in-microsoft-excel.html