In mid May, Anup47 asked a question in the Chandoo.org forums about the use of a VBA macro to run a number of iterations of a variable against two sets of X values, you can see the post here. It turns out that the number of iterations was 500 columns of data with each column having 27 values.
On examination of the problem, it was going to be a straight forward matter of setting up a statistical function Linest and then using the Data Table command to run each set of data through the function.
The Linest will take the input data and return the statistics that Anup wanted.
The Data Table function will feed in the source data and tabulate the Input and Output data.
This Post follows through a worked example which you can follow along, download the Sample file to suit Sample File 97/2003 or Sample File 2007/10 version. The Sample File contains a worked example of the completed model as well as a Practice Page of the original data. Download the Excel 95/2003 or 2007/10 version above.
Please note that the sample file only contains 14 sets of data as opposed to the 500 Anup47 wanted to process.
Setup
There are a few things that needed setting up before the work starts.
- Headers
- Linest Area
- Link Area
- Data Table Area
Once these areas are setup we simply use the Excel Data Table function.
Once the Data Table function has run, the results can be processed or analysed as required.
Headers
The original data was just that, a tabulation of raw data. The two X sets of Data were in Columns 1 & 2. Each Column from D onwards has a set of Y data that was to be processed.
The first thing that was required was some Headers for the Input Data.
This isn’t strictly required but it is good practice and makes it easier to tabulate and analyse results later.
Insert a Row above the first line
Put X1, X2 in A1, B1 and Y1 in D1 and then drag the lower right Black Handle across top to the right and Excel will autofill the remaining cells.
Linest Area
To get the statistics which Anup wanted we will use the Excel Linest function.
Linest is a Statistical Function that takes a set of data and compares it, in this case to two sets of X Values and produces a set of statistical measure relevant to the correlation between the data sets.
This post isn’t going to explain the intricacies of Linest and I refer you to the Links section at the end where you can read more about the Linest function at your leisure.
For our purposes we need to know that Linest is an Array Formula and requires a 5 Row x 5 Column area to be entered into. For now we will just Array Enter the function =Linest($D$2:$D$28,A2:B28,True, True) into B32:F36.
To do that select the range B32:F36, Press F2 and type/paste the equation in, then Array Enter with Ctrl Shift Enter.
Link Area
To Link the Linest equation to a Data Table we need a link cell, which we will put just above the Linest area.
For now just enter a 1 in it.
We can now go back to the Linest area and link the Linest equation to our link area using the equation, =LINEST(OFFSET($C$2:$C$28,,$B$30),A2:B28,TRUE, TRUE)
To do that select the range B32:F36, Press F2 and type/paste the equation in, then Array Enter with Ctrl Shift Enter.
What this does is allow the Linest formula to access different columns Y1 to Y500 depending on the value of the Link cell B30 which is now 1.
Data Table Area
To setup a Data Table area we need a column of Inputs which will be the Run Numbers and the Row Inputs will be links to the Input and Output Cells.
In a range J33:J46 put the values 1 to 14. These will be the Run Numbers. ie Run No 1, Run No 2 etc (Green in the example below).
Across the top of the Data Table area we can put a number of links and associated labels (Yellow and Blue)
In this case there are 4 Output links =B31, =C31, =B34 and =B33 and their associated labels above them, as well as 2 Input equations and there Labels. The Input equations are simple Offset function that retrieves a value from Rows 1 or 2 based on the value of the Link Cell B30.
These are technically not required but make data analysis and identification of individual results later on a lot simpler.
Run Data Table
We can now run the data Table by selecting the Data Table area: J32:P46
Noting that we will be using a Column Input cell and that it will link to $B$30, the Link cell for the Linest command.
What this does is takes the first value from the Column J32:J46 and puts it into B30, then the Linest command will be calculated and the results put into the Data Table area along with the Inputs.
This is repeated for each cell in J32:J46 automatically.
The final Data Table is now populated as below:
You can see by extending the Data Table input column from 14 to 500 that the full 500 columns of Input Data could easily be processed.
Results
You now have a set-off data that can be analyzed using normal statistics, Min, Max, Std Deviation etc, or can be fed into a Pivot Table/Chart for analysis etc.
References
Linest References
http://chandoo.org/wp/2011/01/26/trendlines-and-forecasting-in-excel-part-2/
http://newtonexcelbach.wordpress.com/2011/01/19/using-linest-for-non-linear-curve-fitting/
Data Table References
http://chandoo.org/wp/2010/05/06/data-tables-monte-carlo-simulations-in-excel-a-comprehensive-guide/
How can the Data Table command help you become a data processing super hero?
How can the Data Table command help you become a data processing super hero?
Let us know in the comments below:






















12 Responses to “Speeding up & Optimizing Excel – Tips for Charting & Formatting [Speedy Spreadsheet Week]”
Usually when I dump data into my files to update values, the formatting sometimes go to all rows or columns. So what I typically will do is go to the last row and then the last column and use Ctrl + Shift + end and then delete the cells highlighted. this will remove all unknown formats in the worksheet. Also, after you have done this, you won't see the benefit until you save the document. Sometimes I even have to close and reopen. The direct sign that this has improved is the size of the scroll bar and range.
I have some comments on a couple of the points.
1. Camera objects
Tip: I use defined names in conjunction with camera tool objects.
Each camera object gets a name like so:
CameraItem01
Referring to: =IF(PicsOn=1,Sheet1!$C$2:$S$5,"")
By setting the PicsOn name to 1, the camera objects become "live", by setting the PicsOn name to 0, they become static. That improves performance enormously.
4: Conditional formatting
Lots of CF rules can slow down your workbook a lot. And it does not show the calc progress a "normal" recalc does on slow workbooks.
5. Format whole columns/rows
as far as I know, there is no problem with formatting entire columns/rows performance-wise, on the contrary, Excel is more efficient when you format an entire column than when you format a couple of 100 rows of a column.
6. Styles.
Here I wholeheartedly disagree. I say: Use styles. And use them religously.
I mean: if you have applied a (custom) style and you need to change a small piece of formatting to make that one cell look right, force yourself to create a new style just for that cell. It forces you to really think about your spreadsheet design and try and streamline it. It also makes it much, much easier to change your sheet's appearance later on. See http://www.jkp-ads.com/articles/styles00.asp
Very good insights Jan..
Camera objects: I often use similar technique to turn off images in my dashboards.
Formats: Thanks for clearing this. Do you think formatting larger ranges has any impact on macro speeds or it does not matter?
Styles: Thanks for telling us about this. As I mentioned, I am not sure about the styles, but I am under the impressions that excessive use of styles can bloat the file size.
@Chandoo:
If you stick to formatting entire rows/columns I don't expect macro speed is affected. Better: try it!
If you use styles properly AND as a replacement of ad-hoc cell formatting, I expect you'll see that the file actually is smaller in size.
This is because the cells now only have a reference to a single style instead of a reference to a custom cell formatting style.
Many cell formatting combinations get created if you format your cells in an ad-hoc manner, which was responsible for the dreaded "Too many different cell formats" error in Excel 2003 and older. Excel 2007 and 2010 have a higher limit there, but it does slow down your file with many of them.
Style bloat in my point of view is what you get by copying and pasting a lot from various other files and thus get Normal 1, Normal 1 1, Normal 1 1 1, ... I have seen workbooks with as many as 6000 styles, all caused by copying and pasting from various differently formatted workbooks.
Excel 2007 and 2010 have fixed a number of issues regarding copying of styles, but for workbooks with a long editing history, the trouble is already in the workbooks.
Cant emphasise the importance of reducing the amount of formatting in a workbook - this has a suprising impact on workbook size. I've always kept to one font, and no more than three colours - this has worked well for me. Keeping things clean and simple should be the motto when designing any type of report/dashboard that is going to be distributed around the organisation.
You can also save a few MB's by saving as an xlsb file.
Has anyone else mentioned that only the first item in the "more ..." section is hyperlinked.
Prem, have you confirmed by trial that XLSB file size is smaller than same XLSX file? Sorry, I just tried it with a small, simple XLSM file. I was surprised to see you are correct. File went from 40kb to 37kb. I thought that the compression of the new file would make the new file smaller.
@Ron
All Excel files have a minimum overhead that they have to include which is around 8KB, just to store a simple number or letter.
So with a small file of 40KB you will not see a huge improvement in file size
With files greater than 10MB you will see large improvements in size.
The compression gained also depends on what the contents of the file include. That is straight numbers, text and formulas can be greatly compressed whereas files that contain a lot of objects especially pictures gain very little from using *.xlsb files.
@Ron.. the other articles are yet to be published. All the links will be updated by Tuesday (27th March).
Hi,
I have a need for x,y scatter chart to have arround 30 data series.
like this:
http://i65.tinypic.com/jra8lc.jpg
Also I have multiple of such charts in one excel file.
Is there any way to make excel faster, because it is irritatingly slow?
(though my PC config. is quite on the level)
Thanks in advance!!!
@Mil
30 series won't be the issue
It is the number of points in the series
Also remove all fancy modifications, like shadows, fancy fills etc
I'd suggest asking the question in the Chandoo.org Forums http://forum.chandoo.org/
Attach a sample file with an example of what you are after
@Hui
I've already removed all fancy mod. The problem is there are also a lot of data points in one series.
Thanks for the advice!
@Mil
Do you really need every data point ?
Where is the chart being presented Screen or Report
On a screen you are unlikely to use more than 800 pixels for the chart area
So using any more than about 250 points is not adding values
On an A4 chart in landscape lets say the chart area is 6" long and at 300dpi that is 2000 pixels
Once again using more than 800-1000 points will not add any value
I have seen charts with 30,000+ points and when this is explained and a work around shown people appreciate the speed up
For a work around try setting up an area where you select say every x'th point using an Offset or Index Function
Then plot that data
I'd suggest asking the question in the Chandoo.org Forums http://forum.chandoo.org/
Attach a sample file with an example of what you are after