Handle Volatile Functions like they are dynamite
If you’re building large models, then you may want to use volatile functions – including OFFSET(), INDIRECT(), and TODAY() – with caution, because unless you know what you are doing, they *might* slow Excel down to the point that data entry is sluggish, if not downright tedious.
In fact, you *might* want to consider getting out of the habit of using these functions at all if there are alternatives, and you might want to replace volatile functions in your existing models with non-volatile alternatives…I have reduced recalculation time in large models from minutes to milliseconds by doing just that!
So what the heck does volatile actually mean? And why should you care? Let’s find out, shall we?
How does Excel update all those cells?
Let’s take a look at how Excel ensures that each cell has the right number in it when you make a change somewhere. But first, a disclaimer: Note that this is an introductory article, and so is necessarily simplistic. If you want to know more about the specifics of this complicated subject, check out the links to Excel MVP Charles Williams’ excellent site at the bottom of this article. Okay, disclaimer ends…
A large Excel model might have several hundred thousand cells with formulas in it. Maybe even several million. Most of these formulas will reference other cells, and many of those cells will have formulas in them that reference other cells in turn, and so on. If a formula in a Cell A2 refers directly to Cell A1, then A2 said to be directly dependent on A1. Obviously if A1 changes, we need those changes to flow through to A2. And when recalculating the entire workbook, we need A2 to be recalculated AFTER A1 has been recalculated. That’s called a dependency chain.
Large models can have a number of very long dependency chains comprising of hundreds of thousands of cells that run across worksheets or even between workbooks. To keep track of how all these cells interrelate – and to ensure that a change in any specific cell’s value correctly flows through to any other cells that may depend on it – Excel builds and maintains what is known as a ‘dependency tree’. Think of this as a big flow-chart or circuit diagram showing how all the cells in one of these giant formulas interconnect. Excel maintains this dependency tree every time you make a change to a formula in a cell, by looking at the argument list of each separate function within that formula. And this dependency tree is saved along with the file itself.
Thanks to this dependency tree, when you change the value in one cell, Excel can work out what other cells might be affected. And so Excel can smartly recalculate just those particular cells. Meaning it doesn’t have to blindly recalculate the whole workbook just because one fairly insignificant part of it might have changed.
So let’s say you change the value of a cell somewhere that has only one other cell pointing at it (and no further cells depend on that other cell). Thanks to smart recalculation, Excel only recalculates the value of the cell you just changed, and the value of that ONE dependent cell. It doesn’t have to recalculate the entire workbook.
Likewise, if you change the value of a cell somewhere that has many, many cells downstream, then Excel of course has to recalculate all of the cells further down that particular chain. But it can safely ignore any cells further up that particular dependency chain. And it can ignore any cells elsewhere that aren’t in this particular dependency chain.
If a long-enough part of a dependency chain gets recalculated, then you might well see the word ‘calculating’ in the status bar while Excel works its way through all the relevant cells in that chain. But usually, this recalculation happens so fast that the word ‘calculating’ flicks on and off so quickly that you don’t notice it.
Not-so-smart recalculation thanks to volatility
Now here’s the important bit: a particular class of formulas called volatile formulas get automatically recalculated any time you enter data anywhere in any open workbook – even if the thing you just changed had nothing to do with those volatile functions. And then this triggers Excel to then recalculate all directly dependent cells downstream from those volatile formulas too. Yikes!
This mean that if you’ve opened a very large spreadsheet model with volatile functions in it – and if those volatile functions have a large number of formulas downstream (or a smaller amount of resource intensive formulas) – then if you are say trying to add items to a shopping list that you’ve started in another workbook it could take minutes for you to add each item to that shopping list, because every time you add an item, it triggers an avalanche of unnecessary and pointless recalculation in the large spreadsheet model.
The fact that each and every cell ‘downstream’ of any volatile formulas get recalculated is an important point to get your head around. Many people think that slow calculation times due to volatility is due to the time it takes to recalculate large amounts of volatile functions in a model. But often most of that delay is in fact due to the recalculation of all the cells ‘downstream’ from those volatile functions. In other words, even just one volatile formula with a very long calculation chain hanging off it could cause you grief. And if that calculation chain gets more and more complex, so does the effect of that one volatile formula.
Here’s how that looks visually:
In fact, it’s not just entering data that will trigger a volatile function to recalculate, but also these things (among others):
- Deleting or inserting a row or column.
- Performing certain Autofilter actions.
- Double-clicking a row or column divider (in Automatic calculation mode).
- Adding, editing, or deleting a defined name.
- Renaming a worksheet.
- Changing the position of a worksheet in relation to other worksheets.
- Hiding or unhiding rows (but not columns)
So almost anything can set off that domino effect. Which reminds me of this:
(And what the heck…if you enjoyed that, then click this link too. But hurry back…this post is getting cold).
So which functions are Volatile?
- RAND() and RANDBETWEEN()
- INFO() (depending on its arguments)
- CELL() (depending on its arguments)
If you’re an intermediate Excel user, then chances are that you already use some of these regularly. For instance:
- OFFSET() is usually the function of choice to anyone who wants to create dynamic ranges
- Many large models make use of the INDIRECT() function to construct cell or range references “on the fly” in response to some choice that a user makes
- Many large models make use of the TODAY() function to check if a date entered by a user occurs in the past, present, or future.
When does this matter?
Most of the spreadsheets you use these functions in are so small that you probably don’t even notice any extra volatility-related recalculation. So no harm done. However, if you’ve ever had that a large spreadsheet that seems particularly sluggish when you’re trying to enter new data – or that seems to impact the performance of other open workbooks – then chances are you know exactly what I mean.
I’ve seen frustrated-looking users waiting for as long as one to two minutes for particularly large models to recalculate after each and every change they make to it, even if those changes are relatively insignificant, such as changing the spelling of a column header.
Often spreadsheets like this get so sluggish that users switch Excel’s calculation setting to Manual, just so they can make changes in a timely fashion, and then switch it on again when they’re done in order to have the model calculate the correct answer. This is dangerous…I’d never set calculation to manual if I could help it. There’s just too much chance that someone someday will use output of such a model without remembering to set calculation to Auto. What’s worse, when you open two workbooks, one saved in manual mode and one saved in automatic mode, they will both have the calculation mode of the first workbook opened. I have seen many cases in my career where analysts have done just that…opened a workbook with calc set to manual, opened a whole bunch of others where calc was set to auto, and then done an entire day’s work without realizing that calculation was subsequently turned off for all of them. Doh!
Previously you might have thought that you had no choice but to switch calculation to Manual, because you might have thought that this sluggishness is an unavoidable consequence of the size and complexity of your spreadsheet. But now you know that it *might* be caused by use of volatile functions, and that volatile functions might not be suitable for some occasions…particularly if you’re building large models that utilize these functions at key points within your model. Replace those Volatile functions with some non-Volatile alternatives, and you’ll likely find that your model stops being a slow dog, and starts being a much faster greyhound. To the point that you can switch calculation back to Automatic again.
What are the alternatives to Volatile functions?
While volatile functions like OFFSET() and INDIRECT() are incredibly useful, you can usually achieve the same thing by using other non-volatile formulas such as INDEX or CHOOSE, as well as through leveraging off the dynamic references that Excel Tables allow.
And instead of the TODAY() function, you can use VBA to populate today’s date as a hard-coded value in big models, as you’ll see in the download file below. Check out the Alternative Functions tab of that file to see some examples of common use of volatile functions, as well as some non-volatile alternatives.
If you’re struggling to find a non-volatile replacement for an existing volatile formula, then you can always post a question on the Chandoo Forum asking for some advice on non-volatile alternatives.
Am I being over-zealous here?
As we’ve seen, too much reliance on volatile functions *might* trigger large parts of a model to be recalculated needlessly. But it’s worth remembering that this is only going to be noticeable in particularly big spreadsheets. So perhaps I’m being a little overzealous here. So if you know what you’re doing, then maybe you don’t want to dismiss volatile functions outright. After all, you can always assess your options on a case by case basis: try them out, test, test, test, test again, and then make a balanced decision.
However, if you know of an alternative formula combination that does exactly the same thing as a volatile formula, then I’d suggest that you get into the habit of using that instead whenever you can. That way you won’t inadvertently have issues when it really matters. And I’d suggest that if you don’t have much experience of functions and performance, then perhaps it’s safest to simply err on the side of caution and steer clear of volatile functions altogether.
So not only do I see little down side to avoiding volatile formulas, but I see a significant upside: I’ve seen plenty of large models built by the likes of the big 4 accounting/consulting firms that make heavy use of volatile functions, and that consequently have recalculation times so long that they are effectively unusable. Stripping out the volatile formulas from these models has resulted in delays from data entry falling from upwards of two minutes to well under a second. Not to mention that users can now work on other files while these models are open, without fear of triggering an avalanche of unnecessary and pointless recalculation. Had these model builders known to avoid volatile functions, they would have saved users a lot of grief.
Excel MVP and Recalculation Expert Charles Williams says:
The better use you make of smart recalculation in Excel, the less processing has to be done every time that Excel recalculates, so avoid volatile functions like INDIRECT and OFFSET where you can, unless they are significantly more efficient than the alternatives. (Well-designed use of OFFSET is often fast.)
In fact, on Charles’ website he goes so far as to say avoid volatile functions wherever possible.
With all that in mind, I’ve made a personal choice to steer clear of volatile functions where I can. Your mileage may differ. Regardless, the subject of volatility is definitely something that intermediate users should be made aware of. What they do with that awareness is up to them. But forewarned is forearmed.
Fancy a demonstration?
Sometimes it’s most helpful to see something with your own eyes. So download this file, open it, and enable macros: Volatility-demo-using-TODAY-20140230
You’ll see it has a dropdown in it, where you can choose to either populate a cell with the volatile TODAY function or with a hard-coded date:
Downstream of that drop-down output cell are 20,000 formulas spread across two columns:
If you choose the Use Volatile TODAY() Function option from the dropdown, then whenever you enter data in that 3rd ‘Completely independent cells’ column then you should notice a significant delay. Change that dropdown to ‘Use Hard-Coded Date’ and you should experience significantly less delay, if any.
You’ll also see a blue button you can click, that will time how long the delay is under each option:
On my system, there’s about a 1 second delay when using the TODAY() option, and almost no delay when using the hard-coded date. (Note that you have to click the blue button twice after you change that dropdown to get the ‘proper’ reading. The first reading will be artificially high.)
Why are some functions volatile?
The reason for some of these functions being volatile this is fairly obvious. For example:
- NOW() should always return the time as at the last calculation, so needs to be refreshed any time new data is put into the workbook, in case one of your formulas does something specific based on the time of day.
- TODAY() similarly must be refreshed to ensure than the day hasn’t changed since the last time something was entered into the workbook (which will be the case, if someone works past midnight, or if they come in in the morning and make a change to a file that they had left open the previous night.)
But the reasons for others being volatile – such as OFFSET and INDIRECT, which are often used by modellers to create dynamic named ranges – are less clear. First, let’s look at what OFFSET and INDIRECT actually do:
- Offset Returns a reference to a cell or a multi-cell range that is a given number of rows and columns from a given reference. So OFFSET($A$1,1,2,5,3) says “Go one cell down from $A$1 (which takes us to $A$2), then two cells across (which takes us to $C$2) and then return a block of cells 5 down from $C$2 and 3 across from $C$2 (which gives us the range $C$2:$D$6)
- Indirect Returns the reference specified by a text string. References are immediately evaluated to display their contents. So Indirect(“$A1”) tells Excel “Go look in cell $A$1, and tell me what’s in it”.
So why would that mean they need to be volatile? Because Excel constructs dependency trees based on cell references.
- INDIRECT() has an argument that is constructed out of text – e.g. INDIRECT( “$A1”). This might look like a cell reference, but it is not. In fact, the argument of an INDIRECT function might equally look something like this: INDIRECT(“$B”&$C$9-2).
- OFFSET() takes numerical arguments, which point to a cell reference, but are still just numbers.
- In order for these to form part of Excel’s dependency tree, the Excel dependency tree algorithm would have to first evaluate text like INDIRECT( “$A1”) or the numerical arguments like OFFSET($A$1,1,2,5,3) in order to determine what the associated cell reference actually is, before adding it to the dependency tree. Maybe the Excel obviously made the call that rather than introduce this extra step where these two functions are concerned, they may as well just make both functions fully volatile.
But given that you can set up INDEX() do much the same thing as OFFSET(), why doesn’t INDEX need to be volatile too? I imagine it’s because INDEX uses range arguments, whereas OFFSET uses numerical arguments. So Excel can extract these range arguments directly from an INDEX() function when building/amending the calculation dependency tree.
Note that INDEX() is what’s called semi-volatile, meaning it gets recalculated when the workbook opens.
And also note that any formulas used in conditional formatting effectively become what Charles Williams calls super-volatile: they are evaluated each time the cell that contains them is repainted on the screen (which happens say if you use the scroll bar to move the ‘view’ up/down or left/right), even in Manual calculation mode. But because no other formulas are ‘downstream’ from conditional formats, then only the conditional format formulas themselves get recalculated. So if you’ve got simple conditional formatting rules, you won’t notice any delay.
I’ll talk about alternatives to using volatile functions in a series of upcoming posts. But meanwhile…if you’re not feeling too sluggish…then check out these great links from Excel MVP Charles Williams.
- Excel 2010 Performance: Improving Calculation Performance
- Smart Recalculation
- Volatile Excel Functions
- Excel Dependencies
- Evaluation Circumstances
- Writing efficient VBA UDFs Part 10 – Volatile Functions and Function Arguments
Pretty much everything I’ve covered in this post came from Charles’ writings, so I’d like to acknowledge the work he has done in explaining this complex subject to countless Excel users over the years. Charles also sells a great add-in called FastExcel for profiling Excel calculation performance and memory useage – so be sure to check that out if you want to get serious about diagnosing volatility issues with your own Excel models.
You may also be interested in Jan Karel Pieterse’s RefTreeAnalyser utility, which among other things allows for easy Auditing of formula dependents and precedents, helps you trace errors, and will let you time your workbook calculation for each worksheet to find bottlenecks as well as check columns for formula inconsistencies. Jan Karel has a free demo version with limited functionality, if you’d like to take it for a spin.
Let me know your thoughts in the comments
This has been a particularly taxing post to write. So if you found this article helpful, please let me know below in the comments. If you’re not following something I said, or can think of a better way to say it, then let me know that too.
About the Author.
Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:
That’s right, pure #VALUE!
Find out more at http://www.heavydutydecisions.co.nz
Introducing our Online Power BI Class:
Would you like to join me on a date with Power BI? In this comprehensive online class, learn all about Power BI so you can create beautiful, insightful & interactive reports. Join me and rest of the play mates for our first ever Power BI Play Date.Click here to know more and join us.
Leave a Reply
|Can you extract first name & last name from email address? [Formula Challenge]||Chandoo.org Podcast: Launching on March 6th|