Handle Volatile Functions like they are dynamite

Posted on March 3rd, 2014 in Learn Excel , Posts by Jeff - 39 comments

Volatile functions in Excel are like dynamite. Handle them with care!

If you’re building large models, then you may want to use volatile functions – including OFFSET(), INDIRECT(), and TODAY() – with caution, because unless you know what you are doing, they *might* slow Excel down to the point that data entry is sluggish, if not downright tedious.

In fact, you *might* want to consider getting out of the habit of using these functions at all if there are  alternatives, and you might want to replace volatile functions in your existing models with non-volatile alternatives…I have reduced recalculation time in large models from minutes to milliseconds by doing just that!

So what the heck does volatile actually mean? And why should you care? Let’s find out, shall we?
 
 

How does Excel update all those cells?

Let’s take a look at how Excel ensures that each cell has the right number in it when you make a change somewhere. But first, a disclaimer: Note that this is an introductory article, and so is necessarily simplistic. If you want to know more about the specifics of this complicated subject, check out the links to Excel MVP Charles Williams’ excellent site at the bottom of this article. Okay, disclaimer ends…

A large Excel model might have several hundred thousand cells with formulas in it. Maybe even several million. Most of these formulas will reference other cells, and many of those cells will have formulas in them that reference other cells in turn, and so on. If a formula in a Cell A2 refers directly to Cell A1, then A2 said to be directly dependent on A1. Obviously if A1 changes, we need those changes to flow through to A2. And when recalculating the entire workbook, we need A2 to be recalculated AFTER A1 has been recalculated. That’s called a dependency chain.

Large models can have a number of very long dependency chains comprising of hundreds of thousands of cells that run across worksheets or even between workbooks. To keep track of how all these cells interrelate – and to ensure that a change in any specific cell’s value correctly flows through to any other cells that may depend on it – Excel builds and maintains what is known as a ‘dependency tree’. Think of this as a big flow-chart or circuit diagram showing how all the cells in one of these giant formulas interconnect. Excel maintains this dependency tree every time you make a change to a formula in a cell, by looking at the argument list of each separate function within that formula. And this dependency tree is saved along with the file itself.

Smart Recalculation

Thanks to this dependency tree, when you change the value in one cell, Excel can work out what other cells might be affected. And so Excel can smartly recalculate just those particular cells. Meaning it doesn’t have to blindly recalculate the whole workbook just because one fairly insignificant part of it might have changed.

So let’s say you change the value of a cell  somewhere that has only one other cell pointing at it (and no further cells depend on that other cell). Thanks to smart recalculation, Excel only recalculates the value of the cell you just changed, and the value of that ONE dependent cell. It doesn’t have to recalculate the entire workbook.

Likewise, if you change the value of a cell somewhere that has many, many cells downstream, then Excel of course has to recalculate all of the cells further down that particular chain. But it can safely ignore any cells further up that particular dependency chain. And it can ignore any cells elsewhere that aren’t in this particular dependency chain.

If a long-enough part of a dependency chain gets recalculated, then you might well see the word ‘calculating’ in the status bar while Excel works its way through all the relevant cells in that chain. But usually, this recalculation happens so fast that the word ‘calculating’ flicks on and off so quickly that you don’t notice it.

Not-so-smart recalculation thanks to volatility

Now here’s the important bit: a particular class of formulas called volatile formulas get automatically recalculated any time you enter data anywhere in any open workbook – even if the thing you just changed had nothing to do with those volatile functions. And then this triggers Excel to then recalculate all directly dependent cells downstream from those volatile formulas too. Yikes!

This mean that if you’ve opened a very large spreadsheet model with volatile functions in it – and if those volatile functions have a large number of formulas downstream (or a smaller amount of resource intensive formulas) – then if you are say trying to add items to a shopping list that you’ve started in another workbook it could take minutes for you to add each item to that shopping list, because every time you add an item, it triggers an avalanche of unnecessary and pointless recalculation in the large spreadsheet model.

The fact that each and every cell ‘downstream’ of any volatile formulas get recalculated is an important point to get your head around. Many people think that slow calculation times due to volatility is due to the time it takes to recalculate large amounts of volatile functions in a model. But often most of that delay is in fact due to the recalculation of all the cells ‘downstream’ from those volatile functions. In other words, even just one volatile formula with a very long calculation chain hanging off it could cause you grief. And if that calculation chain gets more and more complex, so does the effect of that one volatile formula.

Here’s how that looks visually:

In fact, it’s not just entering data that will trigger a volatile function to recalculate, but also these things (among others):

  • Deleting or inserting a row or column.
  • Performing certain Autofilter actions.
  • Double-clicking a row or column divider (in Automatic calculation mode).
  • Adding, editing, or deleting a defined name.
  • Renaming a worksheet.
  • Changing the position of a worksheet in relation to other worksheets.
  • Hiding or unhiding rows (but not columns)

So almost anything can set off that domino effect. Which reminds me of this:

(And what the heck…if you enjoyed that, then click this link too. But hurry back…this post is getting cold).

So which functions are Volatile?

These ones:

  • NOW()
  • TODAY()
  • RAND() and RANDBETWEEN()
  • OFFSET()
  • INDIRECT()
  • INFO() (depending on its arguments)
  • CELL() (depending on its arguments)

If you’re an intermediate Excel user, then chances are that you already use some of these regularly. For instance:

  • OFFSET() is usually the function of choice to anyone who wants to create dynamic ranges
  • Many large models make use of the INDIRECT() function to construct cell or range references “on the fly” in response to some choice that a user makes
  • Many large models make use of the TODAY() function to check if a date entered by a user occurs in the past, present, or future.

When does this matter?

Most of the spreadsheets you use these functions in are so small that you probably don’t even notice any extra volatility-related recalculation. So no harm done. However, if you’ve ever had that a large spreadsheet that seems particularly sluggish when you’re trying to enter new data – or that seems to impact the performance of other open workbooks – then chances are you know exactly what I mean.

I’ve seen frustrated-looking users waiting for as long as one to two minutes for particularly large models to recalculate after each and every change they make to it, even if those changes are relatively insignificant, such as changing the spelling of a column header.

Often spreadsheets like this get so sluggish that users switch Excel’s calculation setting to Manual, just so they can make changes in a timely fashion, and then switch it on again when they’re done in order to have the model calculate the correct answer. This is dangerous…I’d never set calculation to manual if I could help it. There’s just too much chance that someone someday will use output of such a model without remembering to set calculation to Auto. What’s worse, when you open two workbooks, one saved in manual mode and one saved in automatic mode, they will both have the calculation mode of the first workbook opened. I have seen many cases in my career where analysts have done just that…opened a workbook with calc set to manual, opened a whole bunch of others where calc was set to auto, and then done an entire day’s work without realizing that calculation was subsequently turned off for all of them. Doh!

Here’s a slide from my Excel Efficiency presentation that warns users not to do this:
Chandoo_Big Trouble in Little Spreadsheet_Slide

Previously you might have thought that you had no choice but to switch calculation to Manual, because you might have thought that this sluggishness is an unavoidable consequence of the size and complexity of your spreadsheet. But now you know that it *might* be caused by use of volatile functions, and that volatile functions might not be suitable for some occasions…particularly if you’re building large models that utilize these functions at key points within your model. Replace those Volatile functions with some non-Volatile alternatives, and you’ll likely find that your model stops being a slow dog, and starts being a much faster greyhound. To the point that you can switch calculation back to Automatic again.

What are the alternatives to Volatile functions?

While volatile functions like OFFSET() and INDIRECT() are incredibly useful, you can usually achieve the same thing by using other non-volatile formulas such as INDEX or CHOOSE, as well as through leveraging off the dynamic references that Excel Tables allow.

And instead of the TODAY() function, you can use VBA to populate today’s date as a hard-coded value in big models, as you’ll see in the download file below. Check out the Alternative Functions tab of that file to see some examples of common use of volatile functions, as well as some non-volatile alternatives.

If you’re struggling to find a non-volatile replacement for an existing volatile formula, then you can always post a question on the Chandoo Forum asking for some advice on non-volatile alternatives.

Am I being over-zealous here?

As we’ve seen, too much reliance on volatile functions *might* trigger large parts of a model to be recalculated needlessly. But it’s worth remembering that this is only going to be noticeable in particularly big spreadsheets. So perhaps I’m being a little overzealous here. So if you know what you’re doing, then maybe you don’t want to dismiss volatile functions outright. After all, you can always assess your options on a case by case basis: try them out, test, test, test, test again, and then make a balanced decision.

However, if you know of an alternative formula combination that does exactly the same thing as a volatile formula, then I’d suggest that you get into the habit of using that instead whenever you can. That way you won’t inadvertently have issues when it really matters. And I’d suggest that if you don’t have much experience of functions and performance, then perhaps it’s safest to simply err on the side of caution and steer clear of volatile functions altogether.

So not only do I see little down side to avoiding volatile formulas, but I see a significant upside: I’ve seen plenty of large models built by the likes of the big 4 accounting/consulting firms that make heavy use of volatile functions, and that consequently have recalculation times so long that they are effectively unusable. Stripping out the volatile formulas from these models has resulted in delays from data entry falling from upwards of two minutes to well under a second. Not to mention that users can now work on other files while these models are open, without fear of triggering an avalanche of unnecessary and pointless recalculation. Had these model builders known to avoid volatile functions, they would have saved users a lot of grief.

Excel MVP and Recalculation Expert Charles Williams says:

The better use you make of smart recalculation in Excel, the less processing has to be done every time that Excel recalculates, so avoid volatile functions like INDIRECT and OFFSET where you can, unless they are significantly more efficient than the alternatives. (Well-designed use of OFFSET is often fast.)

In fact, on Charles’ website he goes so far as to say avoid volatile functions wherever possible.

With all that in mind, I’ve made a personal choice to steer clear of volatile functions where I can. Your mileage may differ. Regardless, the subject of volatility is definitely something that intermediate users should be made aware of. What they do with that awareness is up to them. But forewarned is forearmed.

Fancy a demonstration?

Sometimes it’s most helpful to see something with your own eyes. So download this file, open it, and enable macros: Volatility-demo-using-TODAY-20140230
You’ll see it has a dropdown in it, where you can choose to either populate a cell with the volatile TODAY function or with a hard-coded date:
Chandoo_VolatileFormulas_Dropdown
Downstream of that drop-down output cell are 20,000 formulas spread across two columns:
Chandoo_VolatileFormulas_Formulas

If you choose the Use Volatile TODAY() Function option from the dropdown, then whenever you enter data in that 3rd ‘Completely independent cells’ column then you should notice a significant delay. Change that dropdown to ‘Use Hard-Coded Date’ and you should experience significantly less delay, if any.

You’ll also see a blue button you can click, that will time how long the delay is under each option:
On my system, there’s about a 1 second delay when using the TODAY() option, and almost no delay when using the hard-coded date. (Note that you have to click the blue button twice after you change that dropdown to get the ‘proper’ reading. The first reading will be artificially high.)
Chandoo_Volatility_Test

Why are some functions volatile?

The reason for some of these functions being volatile this is fairly obvious. For example:

  • NOW() should always return the time as at the last calculation, so needs to be refreshed any time new data is put into the workbook, in case one of your formulas does something specific based on the time of day.
  • TODAY() similarly must be refreshed to ensure than the day hasn’t changed since the last time something was entered into the workbook  (which will be the case, if someone works past midnight, or if they come in in the morning and make a change to a file that they had left open the previous night.)

But the reasons for others being volatile – such as OFFSET and INDIRECT, which are often used by modellers to create dynamic named ranges – are less clear. First, let’s look at what OFFSET and INDIRECT actually do:

  • Offset Returns a reference to a cell or a multi-cell range that is a given number of rows and columns from a given reference. So OFFSET($A$1,1,2,5,3) says “Go one cell down from $A$1 (which takes us to $A$2), then two cells across (which takes us to $C$2) and then return a block of cells 5 down from $C$2 and 3 across from $C$2 (which gives us the range $C$2:$D$6)
  • Indirect Returns the reference specified by a text string. References are immediately evaluated to display their contents. So Indirect(“$A1″) tells Excel “Go look in cell $A$1, and tell me what’s in it”.

So why would that mean they need to be volatile? Because Excel constructs dependency trees based on cell references.

  • INDIRECT() has an argument that is constructed out of text – e.g.  INDIRECT( “$A1″). This might look like a cell reference, but it is not. In fact, the argument of an INDIRECT function might equally look something like this:  INDIRECT(“$B”&$C$9-2).
  • OFFSET() takes numerical arguments, which point to a cell reference, but are still just numbers.
  • In order for these to form part of Excel’s dependency tree, the Excel dependency tree algorithm would have to first evaluate text like INDIRECT( “$A1″) or the numerical arguments like OFFSET($A$1,1,2,5,3) in order to determine what the associated cell reference actually is, before adding it to the dependency tree. Maybe the Excel obviously made the call that rather than introduce this extra step where these two functions are concerned, they may as well just make both functions fully volatile.

But given that you can set up INDEX() do much the same thing as OFFSET(), why doesn’t INDEX need to be volatile too? I imagine it’s because INDEX uses range arguments, whereas OFFSET uses numerical arguments. So Excel can extract these range arguments directly from an INDEX() function when building/amending the calculation dependency tree.

Note that INDEX() is what’s called semi-volatile, meaning it gets recalculated when the workbook opens.

And also note that any formulas used in conditional formatting effectively become what Charles Williams calls super-volatile: they are evaluated each time the cell that contains them is repainted on the screen (which happens say if you use the scroll bar to move the ‘view’ up/down or left/right), even in Manual calculation mode. But because no other formulas are ‘downstream’ from conditional formats, then only the conditional format formulas themselves get recalculated. So if you’ve got simple conditional formatting rules, you won’t notice any delay.

More info:

I’ll talk about alternatives to using volatile functions in a series of upcoming posts. But meanwhile…if you’re not feeling too sluggish…then check out these great links from Excel MVP Charles Williams.

Pretty much everything I’ve covered in this post came from Charles’ writings, so I’d like to acknowledge the work he has done in explaining this complex subject to countless Excel users over the years. Charles also sells a great add-in called FastExcel for profiling Excel calculation performance and memory useage – so be sure to check that out if you want to get serious about diagnosing volatility issues with your own Excel models.

You may also be interested in Jan Karel Pieterse’s RefTreeAnalyser utility, which among other things allows for easy Auditing of formula dependents and precedents, helps you trace errors, and will let you time your workbook calculation for each worksheet to find bottlenecks as well as check columns for formula inconsistencies. Jan Karel has a free demo version with limited functionality, if you’d like to take it for a spin.

Let me know your thoughts in the comments

This has been a particularly taxing post to write. So if you found this article helpful, please let me know below in the comments. If you’re not following something I said, or can think of a better way to say it, then let me know that too.

About the Author.

Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:

=NOT(EVEN(PROPER(OR(RIGHT(TODAY())))))

That’s right, pure #VALUE!

Find out more at http://www.heavydutydecisions.co.nz

Your email address is safe with us. Our policies

Written by Jeff Weir
Tags: , , , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

39 Responses to “Handle Volatile Functions like they are dynamite”

  1. Very useful Tips, Chandoo Tips will Always fantastic…

  2. Desire says:

    What can replace DIRECT function

  3. Oz du Soleil says:

    Thanks for this. I’d never heard of the dependency tree and smart recalculation. This helps me understand a lot.

    Also, thanks for the warning against setting calc to manual.

    One request that I have is for an explanation of nonvolatile functions that cause Excel to drag. I’ve never used any of the volatile functions in such volume that they hurt performance. But I’ve seen thousands of SUMIFS and VLOOKUP slow and even crash Excel. How are they different from other functions like OR, INDEX, CHOOSE and IFERROR?

  4. Chris Macro says:

    Great article Jeff! I am a habitual user of OFFSET (I’m trying to overcome my addiction) with named ranges to make them automatically resize. As you continue your series, I’d be interested in hearing how you handle those situations efficiently.

  5. Perri says:

    Thanks for this. I use Indirect quite often when I build dashboards. It enables me to look up value on different tabs based on the selection. What will be an alternative to indirect?

    Thanks,
    Perri

    • Jeff Weir says:

      I’ll be covering this in a future (as yet unwritten) article.

      If the different ranges are on the same sheet, you could use the ‘reference’ version of INDEX to say dynamically sum a particular named range based on a number stored in A2, like this:
      =SUM(INDEX((Range1,Range2,Range3),,,A2) )
      …where Range1 etc are named ranges (consisting of one or more cells in each range) and in A2 is a number telling Excel which of those three range names you want the INDEX function to return

      For instance, say we have three named ranges: Sales, Forecast, and Variance. And lay we have a picklist in cell A1 where a user can choose either ‘Sales’ or ‘Forecast’ or ‘Variance’. And in A2 we have an IF or VLOOKUP or even CHOOSE function that returns 1 if the user selects Sales, 2 for Forecast, and 3 for Variance.

      Then we have a formula like this:
      =SUM(INDEX((Sales, Forecast,Variance),,,A2))

      …which dynamically returns the sum of the range that the user selects with the picklist.

      If the ranges are on different sheets, choose can do the same thing with this formula:
      =SUM(CHOOSE(A2,Sales, Forecast,Variance)) as the CHOOSE function also accepts ranges.

      So the CHOOSE function can be better at this than INDEX, because it doesn’t care if your data ranges are on different sheets or even in the same workbook. Whereas try to point index to multiple ranges on different sheets and you will get an error.

  6. Ron says:

    Thanks for this interesting intro to optimizing. You give us a list of functions to not use, or use carefully. It would help if in this article you would also include alternatives to the functions. I saw a couple, like hard code dates instead of =Today(), but I would appreciate it if you provided more emphasis on these alternatives. I understand that you can’t go into detail about everything, but at least knowing immediately what the alternative is allows us to do our own research until you can write the rest of the articles in this series.

    Thanks

    • Jeff Weir says:

      Ron: as per my article above, if you’re struggling to find a non-volatile replacement for an existing volatile formula, then you can always post a question on the Chandoo Forum asking for some advice on non-volatile alternatives, and you can check out the Alternative Functions tab of the example download file to see some examples of common use of volatile functions, as well as some non-volatile alternatives.

      And you can always google “Non-volatile alternative to [some function].

      Your line I understand that you can’t go into detail about everything, but at least knowing immediately what the alternative is allows us to do our own research until you can write the rest of the articles in this series. made me laugh out loud…you’re asking me to let you know now about what’s in the other articles. The other articles aren’t written yet. So I can’t let you know now, because my time machine broke.

      • Darren Chapman says:

        Need help fixing your time machine Jeff!!! I can’t stop laughing out loud in the office!!! that’s hilarious!!

        • that’s funny… he mentioned couple of volatile and the non volatile alternatives…. and he did detail them quite a bit…
          What I would like to eventually see, is a list with all volatile functions…
          This is a great awesome article about volatility of functions …

          Luv’d it, shar’d it, memoriz’d it.

  7. Henk Huiting says:

    Hello,
    I didn’t know this, so thank you for the explanation.
    You wrote: “And this dependency tree is saved along with the file itself”. Is there a way with VBA to make this tree visible in a sheet? I know i could use the “blue arrows” to see the dependency but is there a way to show it as a tree or in a table?
    Thnx!

  8. Tim says:

    “CHOOSE….Whereas try to point index to multiple ranges on different sheets and you will get an error.”

    Oh, this tangent may open up a whole new world of possibilities.

    Thanks Jeff!

  9. Steffan says:

    A *lot* of good stuff on this thread.
    I am always using an INDIRECT nested in an INDEX, with the indirect pointing to a named range to summarize elements from several larger tables. (Like INDEX(INDIRECT(“Region_”&A1&”_Office_”&B1),2,1)) I imagine I can sub out CHOOSE for the INDIRECT, (I can even write a macro to do it for me,) but the resulting formula will be very large (at a minimum, 30 named ranges are being used.) Is there any way to get the speed of the CHOOSE with the dynamic features of INDIRECT?

    • Jeff Weir says:

      Steffan – nope. That’s the down side of using CHOOSE…you’ve got to specifically identify all the ranges. But that’s what makes it non-Volatile…because Excel can then extract those cell references and use them to amend the calculation dependency tree.

      Or you can use PowerPivot if installed to bring all those separate tables into one PivotTable, and then users can filter that pivottable on the region and office of choice.

      But then you might as well just go with CHOOSE…it will handle up to 254 ranges. If you’re worried about the length of the individual range reference, then you could always assign a short name to each range like “P_1″ and reference that instead of the longer Sheetx!A2:Z50

      • Steffan says:

        I went the macro route. My workbook has a dashboard driven by offset which relied upon a large table which in turn summarized the results of 30 calculators using named ranges. Each calculator used indirect() to construct the name of a named range which holds the calculator’s drivers. (about 20 drivers are used in their own tables which come from SQL in one big lump, and often there would be the need to add or multiply drivers together, so there would be multiple INDIRECTs() in the calculators.) I wrote a macro to fish out the INDIRECT() clause from the formula, and then used activesheet.evaluate() to convert that clause to its static named range. I ran that against the calculators and got so much speed back that I found that i could leave the dashboard’s offset formulas in place.

        • Jeff Weir says:

          Awesome! Another approach is that you could always use a bit of SQL to mash things up, including ranges inside the workbook itself and parameters from the workbook. Although it requires quite tricky VBA, unless you have PowerPivot installed, in which case you can do this natively.

          • Steffan says:

            OK, that IS pretty awesome. The macro works to convert the workbooks that I already have in place, but the idea of using SQL and a “remote” connection to the same workbook to do my aggregation is pretty dang awesome and I’ll try it when building new models. Thanks for putting me on that trail!

            http://support.microsoft.com/kb/257819

  10. Matheus says:

    Great post!!

    1 question: Does formulas inside arrays and vectors are also volatile? I mean, if a change 1 cell inside a array, does the others are recalculated?

    Thanks!

    Matheus

  11. Mark duchesne says:

    Thanks for the heads up Jeff, this is all new information to me. I often use indirect and offset to create dynamic references/ranges based on user selection. Your article has certainly opened up my mind to explore the alternatives and how I might leverage index/choose a little differently to achieve the same result.

  12. […] folk. Jeff Weir here. You might remember me from shows such as Handle volatile functions like they are dynamite, Did Jeff just Chart, and Robust Dynamic (Cascading) Dropdowns Without […]

  13. Brian says:

    Very little detailed explanation out there on the subject of how Excel goes about calculation via dependency trees etc.

    Thanks very much Jeff for your clear explanation.

  14. Lee says:

    Thanks, it is very usful post. I’ve learnt a lot from your websit.

  15. Roger says:

    Very useful info on volatile functions, thanks.
    I frequently use offset() for dynamic named ranges in a table. This then allows me to selectively sum one column, depending on the values in another column.
    Eg.
    _col1 = dynamic range of values in column1 (using offset())
    _col2 = dynamic range of values in column2 (using offset())
    sum = sumproduct(_col1*(_col2=”some value”))

    I don’t want to use DSUM , since the selection criteria can’t be encoded into a single cell.

    Is there any better way of achieving this, without the volatile offset()?

    Many thanks

  16. Charlie says:

    Hi Jeff,

    Thanks for this. Really useful. I’m not really at the level where I usually worry about these things but am getting there. I’m just dipping my toe into the VBA world. What is the text that I would have to copy for the VBA to hard code the date into cell A1? Tried to grab it from the download but too amateur to be able to work it out.

    Really appreciate your article, explained a lot!

    Thanks.

Leave a Reply