This series of articles will give you an overview of how to manage spreadsheet risk. These articles are written by Myles Arnott from Excel Audit
- Part 1: An Introduction to managing spreadsheet risk
- Part 2: How companies can manage their spreadsheet risk
- Part 3: Excel’s auditing functions
- Part 4: Using external software packages to manage your spreadsheet risk
In the first two articles in this series we highlighted the risks that poorly managed spreadsheet solutions can introduce to a business and outlined the steps companies can take to manage this risk. This article works through the application of some of Excel’s built in auditing functions:
- Error checking (Background and stepping through each error)
- Trace Error
- Circular Reference
- Go To Special
Let’s have a look at an example spreadsheet that is riddled with issues.
The spreadsheet contains four tabs: a simple front page; an Example tab with the report that we wish to audit; a Resolved tab with the corrected report; and a Notes tab which details all of the issues contained within the spreadsheet (if you print the Resolved tab, all of the comments will also be printed for your reference).
If you are up for a challenge you could download the file and work through the report in the Example tab to see how many of the errors you can find yourself.
First off let’s identify the obvious issues
Circular reference
On opening the file you are presented with this warning message:
Click OK to continue opening the file. Here is how the report looks:
Excel helpfully gives you the location of the first circular reference (Q30) in the bottom left corner of the screen:
An alternative approach to locating circular references is to select Error Checking > Circular References on the Formulas tab of the Ribbon:
By clicking into the formula on cell Q30 you will see that the formula is =AVERAGE(M30:N30,P30:Q30)
. This average formula is including the cell Q30, hence the circular reference.
[Related: Understanding & Using Excel Circular References]
#REF error
The next obvious issue is that cells I13, J13, J33, S13, S18 & S33 contain the #REF error. The #REF error is a warning that the formula contains an invalid cell reference (this usually happens when the user deletes a cell/row/column/worksheet that is being referenced by a formula).
To trace the cell originating this error select any cell containing the error (I chose S33 as this would appear to be the main report total), and select Error Checking > Trace Error on the Formulas tab of the Ribbon:
This highlights that cell I13 is the source of the error:
Cell I13 contains the formula =3109+#REF!. To remove the error simply remove the +#REF! within the formula.
It is also however important to try to understand what cell was referenced by the formula originally. The best way to do this would be to talk to the user/previous user (if they are still there) and look back through archived versions of the report (if they exist).
Now that the obvious issues have been identified we are now going to employ some of Excel’s other auditing tools to see if there are any hidden errors.
[Related: Understanding & fixing Excel Formula Errors]
Excel’s error checking function
I’m sure that you will have noticed the small green triangles in the top left hand corner of some of the cells. This is Excel’s background error checking function warning you that these cells break one of the predetermined rules.
Firstly let’s have a look at the errors that are being checked for. To open the Error Checking options select File > Options> Formulas (2010) or Office button> Excel options>Formulas (2007).
Below is the default set up:
When reviewing a spreadsheet for errors it is always worth a quick check to ensure that the above is set up as you would like it to be. I always also tick the “Formulas referring to empty cells” rule.
Click OK to return to the spreadsheet.
The most systematic way to walk through all of the issues identified by the error checking function is to run Error Checking on the Formulas tab of the Ribbon:
This launches the Error checking dialogue box and allows you to review each error in turn:
I will leave you to run through the errors one by one to see what Excel picks up.
Please note that this is not a fool proof check as it is simply checking against the predefined rules. This function will not highlight cells that comply with the rules but contain other errors. It can also highlight cells as an error when they are not (eg P13, in this case click on “Ignore Error”). A very useful starting point nonetheless.
Reviewing the report structure
A crucial step to ensuring that a spreadsheet is error free is to understand its structure, and then to ensure that this structure is correct and consistent.
The simplest way to do this is to identify the different types of cells and their relative positions within the worksheet. For this simple example we are looking to identify:
- Input cells (Numbers)
- Input cells (Text)
- Formula cells
- Formula cells returning an error
To achieve this quickly and simply I have built a basic macro which is within the spreadsheet and can be run from the “RUN” button in the Example tab.
This colors each cell type as follows:
This very quickly identifies some structural issues in the spreadsheet:
So how does this work?
The macro above uses Excel’s Go To Special function which helps you to quickly select cells of different types.
To launch Go To Special, click on Find and Select> Go To Special on the Home tab of the Ribbon:
(Alternatively press F5 or Ctrl + G to launch the Go To dialogue box and then click on Special…)
For example, selecting Constants and leaving just Numbers ticked will highlight all numbers on the current worksheet:
It is worth playing with the options on Go To Special as there are some great functions that I sadly don’t have time to cover here (the precedents, Dependents and Row/Column differences functions are particularly useful).
[Related: More uses of Go To Special in Excel]
And Finally…
As valuable as these initial tests are there are still some issues in the spreadsheet that only a detailed investigation will highlight.
So I’ll leave you to grab a coffee and see if you can find them (they are covered in the Notes and in the Resolved tab).
In the final article of the series we will have a quick look at an example of spreadsheet auditing software.
Also, we are planning to write an article explaining other useful features of Go To Special dialog.
What about you?
Do you use Spreadsheet auditing functions? What is your experience with them? What are your favorite features? Please share using comments.
Thank you Myles
Many thanks to Myles for writing this series. Your experience in this area is invaluable. If you enjoy this series, drop a note of thanks to Myles thru comments. You can also reach him at Excel Audit or his linkedin profile.
8 Responses to “Pivot Tables from large data-sets – 5 examples”
Do you have links to any sites that can provide free, large, test data sets. Both large in diversity and large in total number of rows.
Good question Ron. I suggest checking out kaggle.com, data.world or create your own with randbetween(). You can also get a complex business data-set from Microsoft Power BI website. It is contoso retail data.
Hi Chandoo,
I work with large data sets all the time (80-200MB files with 100Ks of rows and 20-40 columns) and I've taken a few steps to reduce the size (20-60MB) so they can better shared and work more quickly. These steps include: creating custom calculations in the pivot instead of having additional data columns, deleting the data tab and saving as an xlsb. I've even tried indexmatch instead of vlookup--although I'm not sure that saved much. Are there any other tricks to further reduce the file size? thanks, Steve
Hi Steve,
Good tips on how to reduce the file size and / or process time. Another thing I would definitely try is to use Data Model to load the data rather than keep it in the file. You would be,
1. connect to source data file thru Power Query
2. filter away any columns / rows that are not needed
3. load the data to model
4. make pivots from it
This would reduce the file size while providing all the answers you need.
Give it a try. See this video for some help - https://www.youtube.com/watch?v=5u7bpysO3FQ
Normally when Excel processes data it utilizes all four cores on a processor. Is it true that Excel reduces to only using two cores When calculating tables? Same issue if there were two cores present, it would reduce to one in a table?
I ask because, I have personally noticed when i use tables the data is much slower than if I would have filtered it. I like tables for obvious reasons when working with datasets. Is this true.
John:
I don't know if it is true that Excel Table processing only uses 2 threads/cores, but it is entirely possible. The program has to be enabled to handle multiple parallel threads. Excel Lists/Tables were added long ago, at a time when 2 processes was a reasonable upper limit. And, it could be that there simply is no way to program table processing to use more than 2 threads at a time...
When I've got a large data set, I will set my Excel priority to High thru Task Manager to allow it to use more available processing. Never use RealTime priority or you're completely locked up until Excel finishes.
That is a good tip Jen...