Howdy folks. Jeff here. I recently gave a presentation on Excel efficiency to a bunch of analysts, in which – among other things – I’d pointed out that if you ever find yourself having to switch calculation to Manual, there’s probably something wrong with your spreadsheet. Here’s the slide:
This prompted one of the participants to come to me for advise regarding restructuring a spreadsheet with that very problem. This analyst had a file with only 6000 rows of data in it, but the file size was something like 35MB, and after each and every change she had to wait at least a minute for the file to recalculate before she could do something else.
It turns out there were two problems with her files that were easy to resolve.
The Confused range
First, there was a problem with the Used Range – the area within a worksheet that Excel thinks contains all your workings and data. You can find out what this is for each spreadsheet by pushing [Ctrl] + [End], and seeing what cell this takes you to. Hopefully it will take you to the bottom-most, right-most cell that you’ve actually used in the sheet:
But occasionally, you’ll see that it might take you far, far below that cell. Maybe all the way to the very bottom of the grid:
This is bad. Why? Because when Excel saves a file, it includes information about things such as what type of Cell Formatting is used within the used range. If the used range includes millions of cells that aren’t even used, then the information that Excel saves regarding these cells can really blow out the file size. This is exactly what had happened in the case of the spreadsheet concerned. After we reset the used range, the filesize plummeted from 35MB to around 2MB.
Often you can reset the Used Range simply by selecting all the the empty rows under your data, and then deleting them. To do this, select the entire row immediately below your data, then press [Ctrl] + [Down Arrow] to extend the selection right to the bottom of the sheet, then right click and select Delete:
Note that you’ve got to use the Right-Click>DELETE option, NOT the Delete key on the keyboard. Pushing that Delete key does not reset the used range. In fact, this is often why the used range is wrong…it still reflects some data that used to be in the sheet, but that the user subsequently deleted using the keyboard.
When you’ve done this, then push [Ctrl] + [End] again and see where you end up – hopefully at the bottom right corner of your data.
Sometimes this doesn’t fix the problem, and you still find yourself well below your data. In this case, a bit of VBA will usually suffice. I’d suggest putting the below code into your Personal Macro Workbook, for times like this:
Dim sht As Worksheet
Dim lng As Long
For Each sht In ActiveWorkbook.Worksheets
lng = sht.UsedRange.Rows.Count
To see what to do with this code, read What would James Bond have in his Personal Macro Workbook.
Too much SUMIF
The second problem is that each file contained something like 60,000 SUMIF formulas in them. And each one of these formulas referenced two entire columns, rather than just the 2500 rows that actually contained data. It’s really easy to see just how big a problem you might have, simply by doing a Find All for the name of the particular function you’re after:
You can throw 60,000 VLOOKUPS or IF statements or other run-of-the-mill functions at Excel and it won’t even blink. But 60,000 resource-intensive number-crunching functions such as SUMIF, SUMPRODUCT, COUNTIF etc pointed at very large ranges will cause Excel to flinch, if not shut it’s eyes completely for large periods of time.
That’s because these functions are like Ferrari’s…very powerful, but very expensive. One SUMIF is going to travel very fast down the highway. A few hundred SUMIFS on the same stretch are still going to whiz by pretty fast. Tens of thousands of them are just going to crash in to each other:
(The image above comes from this New York Times article detailing a spectacular traffic pileup in Japan in 2011 that left a highway strewn with the smashed wreckage of eight Ferrari’s, a Lamborghini and three Mercedes sports cars. No-one seriously hurt apart from severely injured pride and a marked increase in insurance premiums the following year.)
Often you can use a PivotTable to do the same thing as a whole bunch of functions like SUMIF, COUNTIF, SUMPRODUCT et cetera. PivotTables are natural aggregation and filtering tools. In this case I could use just one PivotTable to replace those 60,000 SUMIFs, and recalculation time dropped from minutes to milliseconds. Now, reporting on this business process is effortless.
One spreadsheet, two morals
I’ve got two morals to share regarding this.
The first is to keep your eyes peeled for signs of trouble in your spreadsheets. Think of FileSize and Recalculation Time as the rev-counter of your car…if it’s getting further and further into the red, then pull over, and check under the hood.
The second – and I can’t underscore this enough – is the importance to organizations of educating all users on how to recognize symptoms of inefficiency. They don’t all have to know how to treat it (although that would be good), but just how to diagnose it. Because if it goes undiagnosed, avoidable inefficiency imposes significant, on-going, and very real opportunity cost. A real dollar amount.
Raising awareness of danger signs is possibly the biggest efficiency gain and risk-reducing opportunity that any training initiative can offer, at the least cost. It’s a game-changer.
Two morals, multiple remedies.
Over at the Daily Dose of Excel blog, I recently posted a mock business case centered around corporate investment in Excel training programme. There’s much more food for thought there, and even more in the comments, so go take a look, and please do leave a comment there with your own thoughts.
While this business case revolves around an internal corporate training programme, another great way of reducing this opportunity cost is through courses such as Chandoo.org’s own Excel School, VBA Classes, and other Chandoo courses.
Not to mention other fantastic courses that you’ll find advertised on the web if you look.
And yet another is though interactions in places like the Chandoo Forum, where you’ll find an army of ninjas with more collective experience than the Borg from Star Trek. The hive mind that is a forum knows no equal.
And of course, you’ll find a wealth of information on this very blog, in articles like I said your spreadsheet is really FAT, not real PHAT!
About the Author.
Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:
That’s right, pure #VALUE!
Find out more at http:www.heavydutydecisions.co.nz
46 Responses to “Big trouble in little spreadsheet”
I have faced this self same problem with my own delegates recently ... TWICE with two separate delegates in different places.
Take a look at my own blog post on one of them where I describe my pivot table solution: http://excelmaster.co under the heading of you need pivots
I have faced this self same problem with my own delegates recently ... TWICE with two separate delegates in different places.
Take a look at my own blog post on one of them where I describe my pivot table solution: excelmaster dot co under the heading of "you need pivots"
Really interesting thanks, I always thought it was due to my very complicated sheets and a slow computer (AMD + 4GB ram). But after thinking a little, I removed a whole pile of formatting in my tables, and now things are running a little faster!
I do leave my spreadsheets in manual calc mode, but that's because I constantly work with spreadsheets that contain between 20,000 - 100,000 rows, sometimes up to 600,000, with generally around 20-30 columns (before any calculated fields are added). Maybe I'm an exception, due to the amount of data I work with, but I'll throw this out there and see if I'm missing opportunities for efficiency.
I have a workbook that analyzes accounts to determine call priority (to collect past due AR), factoring in their ATB (aged trial balance, how much they owe, how much is current, 1-30 days past due, 31-60 days past due, etc), ADP (average days to pay), percent of credit limit used, whether they've had an NSF (insufficient funds notice, basically their check bounced), if we have unapplied cash on their account (like an unused credit that could balance out a due amount if we call and get their approval to use it for that purpose), their pay terms (how long they have from invoice until payment is due), and factors in all those bits of data to come up with a weighted point total to see which accounts need to be at the top of the call list, which is then distributed amongst the collectors.
There's generally around ~20,000 accounts on the list. The data comes from 5 different SAP reports. I originally had a different workbook for each report and linked to them from the ranking/calc spreadsheet, but I sped it up a bit by putting all the data on tabs (in calculation order) in one workbook from left to right leading up to the calculation sheet at the end.
The things I've found so far that help cut down calculation time:
Clean and sort the raw data before putting it into the calculation workbook. This may seem obvious or blasphemous based on your background. I used to work in digital illustration and it was key to always have the original, untouched raw data locked in at the back with all the adjustments layered above it so you didn't ever lose any of the source, so my initial practice was to always build my sheets to use the raw data exactly as it comes out of SAP. However, the calculation time saved when you do just a little work to make sure the data is arranged in an orderly fashion is significant. Every time you touch the data there's another opportunity for error, though, so it's something to be balanced.
When you need to remove certain rows from a table based on criteria, sort the table by that criteria first and it drastically reduces the time needed when you delete all rows with matching criteria.
If you have multiple columns that reference the same source table, don't use index & match in every one of those columns. Use match in one column to find which row of the source table to look at, then just use index for the other columns and reference the row number returned from the match column. For example, the final piece of my workbook is a sheet that builds a worklist for an individual collector. You put in the collector number at the top and it compares that to the total number of collectors and builds the list. If you have 7 collectors, then this looks at the ranking and lists every 7th account from top to bottom. One column generates the row numbers of those accounts using match, then the rest use index to pull the data over.
I'm about to switch worlds again, this time to accounting, so I'm doing a bit of housecleaning to make sure all my SOPs are in order and the spreadsheets I use are as clean and efficient as possible. If anyone sees a flaw in what I've described, or an opportunity to speed it up (preferably without using VBA, I'm leaving the work to people that will need a lot of coaching just to understand index and match, so VBA is only going to be a deeper mystery to them) I'd be happy to hear it.
I would also consider using of Power Pivot as I believe that can combine data from varying sources
Thank you for the suggestion. Unfortunately it isn't viable in this case. I approached that angle about a year ago, but was shot down. We're still on Excel 2010 and installing the powerpivot addon requires a lot of hoops to be jumped through with IT to get someone to install it (our machines are locked down pretty tight). Since I built this monstrosity of a spreadsheet to simulate what an upcoming SAP module is going to do automatically, my request was shot down. The problem is that the upcoming SAP module, which is nearly finished, hasn't lived up to expectation and will provide only a fraction of the functionality my spreadsheet does. Fun world.
Your situation resonates! IT really is digging itself a hole in many large organizations, and that has a lot to do with why we have a boom in "self-service" (or, "leave me alone!") user-led solutions.
Just one question: are you quite sure you aren't able to install the PowerPivot add-in? It's really very simple, and free.
If IT refuses to help you, AND to install the self-service add-in, then they really are shooting themselves in the foot.
Installing anything, even something as mundane as the powerpivot addon, requires administrator password, which IT holds.
Thank you for the Index/Match comment! My spreadsheet was built over a couple of months, adding new formulas as new requirements arose. This resulted in 86,000 cells with Index and Match. I'm off to clean up some formulas!
Kenneth - instead of drawing data from 5 seperate reports, I'd suggest get IT to create a new report that does as much of the number-crunching within SAP as possible. I often see files where most of the formulas are used to combine data that comes from the same database. In those cases, a simple tweak to the SQL query behind the reports makes tens of thousands of formulas redundant. At the same time, it's pretty simple to have SQL create calculated fields for current, 1-30 days past due, 31-60 days past due columns.
What kinds of formulas do you have in this apart from Index, Match? Anything really resource intensive like SUMIFs etc? Can you post a sample file somewhere, like on the Chandoo forum? I'd like to take a quick look.
Jeff, in theory, that's a brilliat idea. However in my organization this would be impossible. I'm working in a commerical department support the division IT of a global company (400k+ employees). I cannot get them to change anything on the system, not taking into consideration that data may come from several different systems (yes all SAP, but at my company alone we have 70+ SAP systems). I really like Hui's idea of using PowerPivot which could really make things much easier, though..
Phil, can't agree more.
In real world (or in my experience), asking IT to revise something simple is actually difficult. That's why many people prefers Excel to BI tools. And as a result, hours and hours of manpower is wasted.
On the other hand, although every one knows that Power Pivot is great. Installing Power Pivot into a PC in my office is not easy...
Sad but True...
Good suggestion Jeff, but Phil and MF nailed it. Changes to SAP are projects, and those are a big deal. There's an upcoming SAP module that I actually bulit this spreadsheet to mimic, but it turns out my mimic is significantly better than the module, which is in the final stages of testing and looks like a half-hearted attempt to replicate part of my spreadsheet.
I feel your pain 🙂
Another thing you can do to minimise recalc time is to convert any of the formulas in your data sheet - except those in the last row - with values. That way,
1. Excel doesn’t try to recalculate these formulas (which it does whenever you open or close the file).
2. When you put new data in the table, it’s very easy to copy the formula down from the last row, and then again then replace every formula except those in the last row with values
Thanks for this Jeff. Its interesting to go to many of the forums that attempt to answer this exact question, yet they don't go into nearly as much detail.
Example, I often hear to limit the use of SUMPRODUCT. But have never read the reasons that you lay out, nor these preferred solutions.
Also, good to learn that the delete key on the keyboard behaves differently than delete within the program. Wasn't aware of that. Good tips.
Thanks for the kind words, Rick.
Caveats are sadly lacking from most blogs. "OFFSET is cool, but they are volitile, which means...". "SUMPRODUCT is increadible, but they are resource intensive, which means...". "PivotTables are awesome, but they store a copy of your data in a PivotCache, which means..."
We educate people on one weapon at a time, but don't give them any basic firearms safety training, nor talk about how to consider which weapon from the armory - or which combination of weapons - best suits a particular set of battle conditions.
Thank you for a great post. I am a culprit of Manual calculations on my latest spreadsheet and would appreciate any advice. It falls into the "confused range" discussion. I have approx 10 worksheets (customer) which are formatted identically with rows of sales data, then a main sheet which summaries the 10 worksheets. The 10 customer worksheets all vary in the number of rows so a lot of my formulas are sum(k7:k9999) to allow for the varying no of rows so I don't accidentally exclude rows. How can I apply your discussion of confused ranges to formulas where the row numbers vary? Many thanks
One thought Cathy, is to change the data ranges into tables. that way you can use table references (structural references). instead of cell ranges. Tables references are dynamic.
Please take a look at one of Chandoo's prior posts on the subject:
Yep, great suggestion, Chris. Whenever you have Excel 2007 or greater, and whenever you have data sitting in a block, and whenever you point formulas, pivots, charts at that data, then you want to change the data range into an Excel Table with the Ctrl + T shortcut.
It's a great habit to get into. As soon as you build a new sheet - or are reviewing an old one - then as soon as you input new data into a block - or come across some existing data in a block, then push Ctrl + T.
Including VLOOKUP tables.
Thanks Chris and Jeff. So the table will automatically adjust in size? Thanks for pointing me in the direction of tables, it's opening a whole new world.
Yes they do, which is why they are so handy when it comes to pointing PivotTables at them...you never have to click the 'Change Data Source' button again.
Not only ... Columns and rows are added as you wish.
If you link a graph or graphs to a table they update automatically, too.
Pivot tables are updated as your table expands.
These tables are so good. I managed a small college's student achievement database in an excel table: 1,700 students, four semesters, six subjects per semester, four assessments per subject per semester. Name, class, father's name, province ... Formulas for pass and fail per semester, attendance percentage ...
It worked really well.
I know, should have been in a database but you need to know where we were and the resources we had!
I'm reading Phil's description and I give it at least a 15% chance he and I work for the same company......
Great tips. Ive read that after the delete action, you might need to go to cell A1 and save the document. Not sure if this helps,but thought i wd mention it 🙂
[…] Is your Excel file big and slow? Jeff Weir suggests a couple of ways to fix things. […]
What about the event where it is columns and not rows that are taking up unnecessary space in the used range? I tried modifying the code to correct that, but it didn't do so.
The fix for the extra rows worked great! But now I'm wondering the same about the extra columns. I tried modifying the code as well (just changed the word "rows" to "columns") and it didn't work.
Anyone know a fix for extra unneeded columns?
One other thing that will cause havoc - dynamic functions. In an industrial environment, there are several versions of historians (which collect process data real-time), which usually have excel plug-ins. The problem comes in if you have =TODAY() or =NOW() in those spreadsheets. All of a sudden, you have 20,000 cells recalculating every time you make a change. Better to use VBA to repopulate a cell with the current time and date when you open the spreadsheet.
Absolutely, Stephen. In fact I make that point at http://chandoo.org/wp/2013/09/29/i-said-your-spreadsheet-is-really-fat-not-real-phat/
under the heading Handle sweaty Dynamite and Volatile Functions with extreme care….
I plan to elaborate on this in a future post.
Thanks for the insightful comment.
@Cathy: My 2 cents of adrice would be to merge all different sheets with Customer data in one single sheet (that should be possible as you state that all sheets have same format).
Then add a new column (in column A for instance) calling it "Customer", so you have a field to differentiate all 10 imported data blocks with sales data from 10 different clients.
Next thing is to convert the data range into a table.
Then create a pivot table based on the created table to substitute the main sheet.
And "voilá" 🙂 You will have only two sheets (one containing data, other containing pivot with summary), easily maintainable (you can order sales data in data sheet by date or by customer, whatever suits you best).
Hope these ideas provide you with solutions to improve your spreadsheet design!
[…] me from mah preeeevious sermons such as Tables, PivotTables, and Macros: music to your ears and Big trouble in little spreadsheet. Well today, I’m going to praise the work of a higher […]
Extremely well written. Just wanted to say that your 'used range' tip (ctrl + end) helped me reduce my workbook size from 70mb to just 6mb! Thanks a bunch!
Thanks. I formatted a column and it entered the formula all the way to the bottom of the possible XL page. Plus add 4 pages wide onto that and i now had a large doc. I wasn't able to click on a square without " not responding" showing up.
Can I in the first place just limit the formula to let's say 500 lines? So I have room to add data without falling off the formula but not blowing out the Size and speed.
Thanks the general info you supplied was on the mark. The following replays is just over the top. For me that is.
The [CTRL] and [down arrow] didn't work for me. I just selected the data area and cut and paste into a new worksheet. I got rid of my extra 1,048,000 of lines.
Purely rhetorical, at this point, but... why would MS avail capability in software that, when used, makes said software crash or work so slowly that it is next to useless in terms of productivity and efficiency? Array type formulas are useful tools, however, when used tend to gum up the works wrt data analysis and normalisation.
For "The Confused range" solution, it is important to SAVE immediately after you delete the extra rows, then close and re-open the spreadsheet. I got this tip from a comment on http://xsformatcleaner.codeplex.com/releases/view/98007#ReviewsAnchor, and it is also mentioned in the Microsoft support article on this issue (https://support.microsoft.com/en-us/help/244435/how-to-reset-the-last-cell-in-excel)
I know it's an old post, but thank you. Your "confused range" tip just reduced one of my spreadsheets from 21.5Mb to 1.5MB. It obviously is still an issue with Excel 2016.
In my case, there were thousands of columns with fancy formatting but no data: someone had just copied the column formats across the whole worksheet. All I had to do was delete all the empty columns.
Awesome...glad you liked it.
I have a large dataset which I need to lookup values from. Is it more efficient to have one large named range (identifier code + 134 columns) or to repeat the identifier code multiple times to break it down into smaller named ranges (e.g. identifier code + 50, identifier code + 50, identifier code + 34)?
Hi Katherine. It depends what the formulas that reference the named ranges do.
If they are VLOOKUP or MATCH formula, then if your current file is running slowly and you want them to run like greased lightning, sort your lookup items and use the "Binary Search" version of the lookup. Read my other guest post at the following link to see just how much faster this is:
If they are SUMPRODUCT formulas, replace them with SUMIFS or COUNTIFS if you can, or even better, use a PivotTable to crunch the numbers.
Given this article of mine a read, too:
Wow, I just squeezed an excel sheet down from 88MB to 250KB using the confUSED range solution you propose. I had to do both, deleting the range first AND run the macro.
Thanks for sharing your knowledge. No formulas in my spreadsheet - yet it had bloated to over 145,000 KB. Using the CTL-End determined one of the workbooks had excessive empty rows. Copied the only 27 rows of data from that workbook to a new workbook then deleted the bloated workbook and now down to a slender 64 KB.
Now how can I do the same when I feel bloated?
massive Chandoo!!! you saved me !!
I really appreciate your post!
My company uses Google Sheets online to maintain spreadsheets with multiple contributers and I'm constantly (almost daily) having to apply what you call the confUSED range fix. I'm so glad it works, but of course no one else will take the time to learn the fix.
Can you help me with understanding how the bloating happens to begin with? Is it always someone applying formatting to the entire sheet? People in my company act like I'm speaking another language when I try to explain even the simplest of solutions... any help would be appreciated 🙂
I'm constantly running into Excel bloating. I'll turn a file into binary and that will fix much of it, but sometimes it doesn't. I just created a new binary file and copied 9 worksheets from the old binary workbook (45mb) into the new workbook. By copied I mean right click on the worksheet and copy to a different workbook. Once I saved the new workbook with the same worksheets, my new binary file went to 3mb. No pivot tables, roughly 85000 rows x 710 columns (with 662 columns blank). Any idea what could cause such bloating?