Dummy Data – How to use the Random Functions
Using collected or known data is the best when developing Excel models, but from time to time this may not be available when you are developing your model.
This post will look at some options for setting up Dummy Data using Excels Random functions.
Variability
Real data displays a range of variability, but this variability is generally within ranges or distributions of ranges of results.
All fields type can contain variability
ie: Country, State Names and Zip/Postal Codes, Maybe large lists but are fixed
Peoples Names, Maybe a large lists but are fixed by local rules
Ages, generally less than 80, never less than 0
Dates: Rarely before 1990 or 1900 in rare cases
Lists: are fixed
Numbers: generally random or conforming to a fixed distribution or known trend
Numbers: may include integers, decimals, negatives, extremely large numbers or all combinations
In generating random lists you will need to choose if you want random data, random data within constraints or random with a distribution. The choice is really yours and should in part be based on what the data is being used for and how accurately it needs to reflect reality.
Techniques
The techniques described below are all shown with a worked example in the attached Examples File or the Excel 2003 Example
Each example is annotated below like (Example 4.). ie: Refer to Example 4 in the above example files.
Dates
Setting up Random Dates is a simple process using the Date function.
=Randbetween(StartDate,EndDate)
Dates in a Range of Years
=Randbetween(Date(2000,1,1),Date(2011,12,31))
Will give a list of Random dates between 1 Jan 2000 and 31 Dec 2011 (Example 1.)
(Thanx Mike W)
Dates in a Month
=Date(2010, 6, Randbetween(1,30)
Will give a list of Random dates between 1 June 2010 and 30 June 2010 (Example 2.)
Don’t worry that the above formula (Example 1) can actually produce a 31 Feb 2005, the Date function will happily convert that to 3 March 2005 (Example 3.)
Dates within a Date Distribution
=DATE(2011,7,NORMINV(RAND(), 0,60))
Will give a list of Random dates between approximately 1 Jan 2010 and 31 Dec 2010, with a mean of July 1 and standard deviation of 2 Months (60days) (Example 4.)
Where NORMINV(RAND(), 0,60) will return values between -180 and +180, 99.7% of the time
Text Fields
Dependant on how many items in the list you require there are 3 techniques available
Choose
For small lists of less than 6 to 10 items you can use a simple Choose function (Example 5.)
=Choose(Randbetween(1,6),”Item 1″, “Item 2”, “Item 3”, “Item 4”, “Item 5”, “Item 6”)
VLookup
Using VLookup (Example 6.)
=Vlookup(Randbetween(1,List Length), List, 2)
Index
Using Index (Example 7.)
=Index(List, Randbetween(1, Counta(List) ))
Numbers
Small Random List of Numbers
Random from a small list of numbers (Example 8.)
=Choose(Randbetween(1,6), Numb 1, Numb 2, Numb 3, Numb 4, Numb 5, Numb 6 )
Note that the numbers:
- Don’t have to be in any order,
- Can be integers, negatives or contain decimals
- Can be repeated
eg: =Choose(Randbetween(1,6), 18, 21, -19, 36.4, 18, 24)
Random Integers
Return Integers between Start and Finish (Example 9.)
=Randbetween(Start, Finish)
=Randbetween(50, 100)
Will return an Integer between 50 and 100
Random Numbers
=Rand()
Will return a random number between 0 and 1
=Round(Rand()*100, 2)
Will Return Numbers between 0 and 100 with 2 Decimal places (Example 10.)
Random Numbers Based on a Distribution
=Norminv(Rand(), Mean, SD)
Will return a random number between 0 and 1 based on a distribution of Average = Mean and Standard Deviation = SD
=Norminv(Rand(), 50, 17)
Will return a random number between 0 and 100 based on a distribution of Average = 50 and Standard Deviation = 17, (Example 11.)
Random Numbers Fitting a Trend
If your distribution has to match a trend add a Random component to the Trends equation (Example 12.)
Y=mX+c
= rand() * X + rand()*5
= rand() * A2 + rand()*5
True/False
Choose
Use Choose and Randbetween (Example 13.)
=Choose(Randbetween(1,2), True, False)
If
Use If and Rand (Example 14.)
=If(Rand()<0.5, True, False)
Combination Text and Numbers
The above techniques can be combined to make lists of Alpha Numeric Data
Say your business has a fleet of vehicles (TR=Truck, VN=Van, CAR=Car)
=Choose(Randbetween(1,3),”TR”,”VN”,”CAR”) & Text(Randbetween(1,15),”0#”)
Will randomly choose 1 of “TR”,”VN”,”CAR” and add a random number between 1 and 15 to it format with a leading 0, eg: TR05, (Example 15.)
Other Sources of Data
Random Data
There are a number of web sites where Random Data is available.
http://www.fakenamegenerator.com/order.php
http://www.generatedata.com/#generator
http://www.melissadata.com/lookups/
Open Source Data
There are a number of web sites where Open Source Data is available.
http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php
Function Used:
Rand: Returns a random number between 0 and 1.
Randbetween: Returns a random Integer between lower and upper limits. Pre Excel 2007 Randbetween was only available through installation of the Analysis Toolpak (Thanx Luke).
Norminv: Returns the inverse of the normal cumulative distribution. That is it returns the X value from a Normal Distribution that has a know Mean and Standard Deviation where the a known cumulative percentage is supplied.
Choose: Choose an item from a list of up to 254 items.
Vlookup: Lookup the matching value from a list and return a data item from another column from the same location.
Index: Retrieve an items from a defined location within a range.
Text: Displays a number as Text with a defined format.
Other Uses of Random Functions
Of course the techniques shown here don’t have to be used for setting up Dummy Data.
One area where Random numbers is used is in Monte Carlo Simulation. This has been discussed at Chandoo.org at Data Tables and Monte-Carlo Simulations in Excel a Comprehensive Guide
Techniques
The techniques described above are all shown with a worked example in the attached Examples File or the Examples File 2003 ver
Limitations in Pre Excel 2007 versions
The Excel function, Randbetween, was only introduced in Excel 2007. As such the exaples above will only work in 2007/10.
However a simple alternative is available
Randbetween(Low, High) = Low + Int(Rand()*(High-Low))+1
Randbetween(90, 100) = 90 + Int(Rand()*10)+1
Examples using this approach are shown in the 2003 Version of the Examples files above.
How have you made Dummy Data or used the Random Functions?
How have you made Dummy Data or How have you used it ?
How have you used Random Numbers in your workbooks ?
Let us know in the comments below:
41 Responses to “SQL Queries from Excel”
I use this method very often.
I always use =SUBSTITUTE (ColumnWithText,"'","''")
to be sure that potential apostrophe in text columns are doubled as required in SQL.
Awesome ! I don't use excel very often so the substitute thing is gold to me 🙂 thanks !
@Leonid.. that is a good technique to use substitute to clean up text apostrophes. thanks
Goal:
Generate update statement in excel where the columns that can be updated are dynamic
You want the columns which are not updated to keep the same value
(or not be overwritten with NULL values with the new generated statement)
the statement can be applied to multiple rows in excel for the same column headers
(This is why the '$' exist for the column headers that are being set)
A1 = First_Name
B1 = Last_Name
C1 = Middle_Name
="
UPDATE PERSONS "&CHAR(10)&
" SET 1 = 1 "&CHAR(10)&
IF(LEN(TRIM($A2))=0,"",", "&$A$1&" = '"&$A2&"'"&CHAR(10))&
IF(LEN(TRIM($B2))=0,"",", "&$B$1&" = '"&$B2&"'"&CHAR(10))&
IF(LEN(TRIM($C2))=0,"",", "&$C$1&" = '"&$C2&"'"&CHAR(10))&
" WHERE name = 'staticordynamicvalue' AND gender = 'staticordynamicvalue'
"
Output (if all columns are set):
UPDATE PERSONS SET 1 = 1,
First_Name = 'Joe',
Last_Name = 'ORien',
Middle_Name = 'Richard'
WHERE age = 28 AND gender = 'm'
Output (if only First _Name (A1) is set):
UPDATE PERSONS SET 1 = 1,
First_Name = 'Joe'
WHERE age = 28 AND gender = 'm'
Possibly my post above is confusing without the actual table to look at. I will do the same example with the table used here. Instead of an insert statement I will generate an update statement for the columns, Cust_Name, Phone & E-mail
where we can generate an update statement for any column individually or together. 🙂 I hope this can help.
=”
UPDATE table “&CHAR(10)&
” SET 1 = 1 “&CHAR(10)&
IF(LEN(TRIM($A2))=0,”",”,Cust_Name = ‘”&$B3&”‘”&CHAR(10))&
IF(LEN(TRIM($B2))=0,”",”, Phone = ‘”&$C3&”‘”&CHAR(10))&
IF(LEN(TRIM($C2))=0,”",”, E-mail = ‘”&$D3&”‘”&CHAR(10))&
” WHERE Cust_Name = ’Bill Gates'
”
Thanks, it has been very useful !
It saved me at least 30 minutes, and time is the most expensive thing in our world...
Hey Paul,
What if any of A2, B2, or C2 is a date field?
The formula above is taking date as string. Any solution?
Even I faced the same problem. If any of the above columns are date, it is taken as string. Any work around for this?
I've found the string concatenation method works well.
At the risk of sounding spammy I would mention that
if it's something your are doing regularly it might be worth investigating a tools
that make it easier, such as QueryCell, an excel add-in I've developed.
It gives you a right click menu option that will produce and then customize insert statements for the selected region of Excel data.
Cheers
Sam
Hi,
For inserting the excel data to your SQL table, you can create insert statements in excel file according to your columns.
then just execute the statements all at once, it will insert the required data to sql server table.
thanks,
How...?
I tried to generate t-sql insert queries from the above example
="insert into values('" &A2 &"','" & B2& "');"
but it generates on one record instead of all records from excel sheet.
I'm using Excel 2003 and the excel sheet contains 922 records.
Most data bases can generate DDL for any object but not a lot of them allow generation of INSERT statements for the table data.
The workaround is to make use of ETL Tools for transferring data across servers. However, there exists a need to generate INSERT statements from the tables for porting data.
Simplest example is when small or large amount of data needs to be taken out on a removable storage media and copied to a remote location, INSERT..VALUES statements come handy.
There is a number of scripts available to perform this data transformation task. The problem with those scripts that all of them database specific and they do not work with textiles
Advanced ETL processor can generate Insert scripts from any data source including text files
http://www.dbsoftlab.com/generating-insert-statements.html
Super Aiticle. Thanks for this post.
I used to deal with the same problem, until found this awsome and free tool.
http://www.xtrategics.com/shapp/String%20Handler.application
regards,
Hi ,
i need a sql query to update a DB in excel 2010..
i have the query(SQL) for insert in excel as ,
="insert into customers values('" &B3 &"','" & C3 & "','"&D3&"');"
similarly i need q sql query for update in excel
i want clear formulas only for insert,delete,update,select
Hi !
I would like to thank you so much ! This trick saves me a lot of time. Thank you so much. Really appreciate it !
-Ankit
You may like to take advantage of this unique tool 'Excel to Database'.
(free for 60 days)http://leansoftware.net The Excel-to-Database utility enables you to validate and transfer data from Microsoft Excel or text file to a database table or stored procedure process. Any text data can be pasted into the application, this may be from another Excel sheet or from text files such as CSV format. SQL Server, Access, MySQL, FoxPro .. Application features Some unique features of Excel to Database include: ?Easy to use color coded/traffic light data validation ?Data is validated as soon it is typed or pasted into Excel ?Upload Excel data to a table or stored procedure process ?Allow default values ?Mandatory/must have fields can be specified ?Allow user friendly column names ?Allow excel formula / calculated fields ?Multiple database type support: Microsoft SQL Server, Access, MySQL and others (to be tested) ?Supports Custom SQL scripts, with SQL/Excel merge fields ?Database validation checks ensure you comply with any rules defined within the database ?Multiple Task configuration ?For co-operative use, Tasks can be shared across a network ?Task configuration is password protected http://leansoftware.net
Its works fine for single record.
I want to update 1000 records in DB. Can you help me.
[...] [...]
Excel database tasks 2.3 (EDT)
you can now load directly from any source into Excel, validate and upload to most SQL database platforms including SQL Server with automatic transaction wrapping.
You can also use EDT as a multi-user application by easily designing your own Edit data tasks and deploying EDT on your users workstations.
Automatically creates UPDATE/INSERT statements based on the primary key. Default SQL can be modified as you require.
Makes the best use if Excel power - formatting, formula, validation, conditional formatting.. without creating any problematic spreadsheets!
Release details on the blog:
http://leansoftware.net/forum/en-us/blog.aspx
Thanks for the interest
Richard
Thanks for the valueable information, it really help me alot.
Thanks again.
As I do with a field of type date?
= "UPDATE SET business datetime =" & "'" & A2 & "' WHERE ID =" & B2 & ""
the date is not 03/10/2012 is 41246. Even putting quotes ...
Please show how to do it properly with dates as well as when those dates are empty. Thanks!
In a separate column make the date to Text using below formula
=TEXT(C2,"mm/dd/yyyy") Then Refer this text column in your update statement
Great post saved me a a load of time on a task i had to complete
thanks for sharing article... helpful!
Thanks 🙂
Hello,
Nice article.
I have also created one tool for create table script using excel http://devssolution.com/create-table-in-sql-using-excel/
Please check it.
Thanks & Regards,
Sandeep Bhadauriya
[…] Excel formula used – http://chandoo.org/wp/2008/09/22/sql-insert-update-statements-from-csv-files/ […]
If any one can help me out with following.
I want to know a SQL query of below excel formula:
=LOOKUP(0,-SEARCH(LEFT(F2,LEN($B$2:$B$100))+0,$B$2:$B$100),$A$2:$A$100)
Excel data is as below;
Name Codes
names1 992
names2 57
names3 856
names4 297
names5 63
if there is a number (29756789) then it should search in sql by taking the prefix of number (297) from (29756789) and return the name field (name4).
Codes can be of two digit or three.
Thanks
Here is a link to an Online automator to convert CSV files to SQL Insert Into statements:
CSV-to-SQL: http://csv-to-sql.herokuapp.com
http://stackoverflow.com/questions/1570387/how-to-insert-data-from-an-excel-sheet-into-a-database-table/37409790#37409790
="INSERT INTO table VALUES (" &A3 &",'" & B3 & "','"&C3&"','" & D3 & "','" & E3 & "'," & F3 & "," & G3 & "," & H3 & ",'" & I3 & "'," & J3 & ");"
B3 has date data that looks like 9/22/17 but with the formula above b3 is coming out as 43000?
how do i fix that?
I just want to insert the Excel records in Sql table without Visiting SQL.
basically i m just want to run a command in Excel Only.
Help Me..plz..?
Hi I have a question maybe you guys have an answer for me
="insert into customers values('" &B3 &"','" & C3 & "','"&D3&"');" where B3, C3, D3 refer to above table data.
the above technique works but is there a way to write it so it takes a range instead of individual columns. because I have an extremely wide table
="insert into customers values(B3:D3);" where B3, C3, D3 refer to above table data.
Awsome
Its Great Effort to help everyone who working with excel.
Thanks for the mini-tutorial on SQL from Excel. Didi it several years ago, but couldn't remember the syntax! All the dialogue was really helpful as well!
The formula above is taking date as string. Any solution?