A few weeks ago Gewilson asked on the Chandoo.org Forums, “Can I simplify my formula? “
=176 – (SUMIF($B10:$AF10,”PD”,$B11:$AF11) +SUMIF($B10:$AF10,”FA”,$B11:$AF11) +SUMIF($B10:$AF10,”PS”,$B11:$AF11) +SUMIF($B10:$AF10,”PN”,$B11:$AF11) +SUMIF($B10:$AF10,”F1″,$B11:$AF11) +SUMIF($B10:$AF10,”P1″,$B11:$AF11) +SUMIF($B10:$AF10,”F7″,$B11:$AF11))
SirJB7 responded with a nice Sumproduct solution:
=176 – SUMPRODUCT(((B$10:AF$10)=({“PD”;”FA”;”PS”;”PN”;”F1″;”F7″})) *(B$11:AF$11))
So Today we will pull this apart to see what inside, I think what we find may surprise you.
SirJB7’s Formula
=176 – SUMPRODUCT(((B$10:AF$10)=({“PD”;”FA”;”PS”;”PN”;”F1″;”F7″})) *(B$11:AF$11))
To Simplify things I am going to use a Truncated set of data and adjust the formula accordingly
We will examine:
=176 – SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))
This problem has a smaller Range B10:I10 instead of B10:AF10
as well as 2 less possible solutions {“PD”;”FA”;”PS”;”PN”} instead of {“PD”;”FA”;”PS”;”PN”;”F1″;”F7″}
The reason for this will soon become evident.
As usual you can download a Sample File to follow along with. Download Here.
Lets go:
=176 – SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))
We can see above that the formula is subtracting the result of a Sumproduct from a Fixed Number 176. So we really only need to focus on the Sumproduct part of the formula.
As we saw In Formula Forensics 007 – Sumproduct, Sumproduct adds up the products of the constituent arrays.
In this case
SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”;})) *(B$11:I$11))
Has only 1 constituent array. The array does consist of 2 components
SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))
These is a Logic component ((B$10:AF$10)=({“PD”;”FA”;”PS”;”PN”}))
and a Numerical Component (B$11:I$11)
Which are then multiplied together.
Looking at the Logical Component first
((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”}))
The formula is checking the Range B10:I10 against an Array of possible solutions {“PD”;”FA”;”PS”;”PN”}
That is, it is checking each value in our list {“PD”;”FA”;”PS”;”PN”}, against each cell in the range B10:I10.
If we type the above equation=((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) into a spare cell C14, and press F9
Excel returns ={FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}
What the …
If we look closely at the above array we will see that it contains a lot of True/Falses separated by ,’s and a few ;’s
={FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}
Specifically there are 4 blocks of 8 True/Falses separated by ,’s, each block is separated by a ;
In Total 4 x 8 = 32 Values
What this is, is an array representing the multiplication of the 8 cells in the range B10:I10 with each element of the possible solution array
Each row of the Array is separated from the next by a ;
Each element in each row is separated by a ,
This is best displayed like:
You can see why I simplified the size of the original problem.
So we have an Array of True/Falses ={FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}
Which is now multiplied by the next component of the Sumproduct
({FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}) *(B$11:I$11)
In a spare cell, say C23 enter
=({FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}) *(B$11:I$11) and press F9
Excel returns
={0,0,0,0,10,0,0,0;0,10,0,0,0,0,0,0;0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0}
Note that this array has the same 8 column x 4 Row layout as above, except that all the True have been replaced by the values in the Score cells B11:I11
Sumproduct now kicks in and adds these up
=Sumproduct({0,0,0,0,10,0,0,0;0,10,0,0,0,0,0,0;0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0})
To get 20
Which is subtracted from our original number
=176 – SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))
= 176 – Sumproduct({0,0,0,0,10,0,0,0;0,10,0,0,0,0,0,0;0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0})
= 176 – 20
= 156
Download
You can download a copy of the above file and follow along, Download Here.
Other Posts In This Series
You can learn more about how to pull Excel Formulas apart in the following posts
We Need Your Help
I have received a few more ideas since last week and these will feature in coming weeks.
I do need more ideas though and so I need your help.
If you have a neat formula that you would like to share and explain, try putting pen to paper and draft up a Post like above or;
If you have a formula that you would like explained but don’t want to write a post also send it in to Chandoo or Hui.
21 Responses to “Formula Forensics No. 010 Count How Many Times a List of Values Occurs in a Range”
I thought that as we only had to sum one array we could have used SUM instead of SUMPRODUCT, but I have had to enter array formula to get the same result. Why is it?
Well, if you only had one value to check for, you would use a SUMIF instead of SUM, but the issue is that the problem requires a check for 6 values. What the formula is doing is taking the first array (Your list of existing values) and creating a new array of TRUE/FALSE for each test value. Each of those new arrays is separated by the semicolons.
What I would truly like to know, is why SUMIFS would not work for this solution instead? I can't see the original forum thread, so perhaps it must be done in 2003.
Linky at the top wasn't working for me, I think the "replies=6" portion was messing it up. If anyone else is interested:
http://chandoo.org/forums/topic/sumif-then-subtract#post-16283
Here's an ever so slightly shorter formula (in terms of number of characters) - going back to the use of the sumif():
=176-SUM(SUMIF(B$10:I$10,{"PD";"FA";"PS";"PN";"F1";"F7"},B$11:I$11))
Hi, Nick Burns!
You're right!
I wouldn't bet that that with SUMIF would work when it didn't with SUMIFS.
Very nice approach!
Regards!
@Luke M: Thanks for adding the correct link, it didn't worked for me neither.
@Cameron:
Hi!
I firstly tried to solve it using SUMIFS instead of SUMPRODUCT, but it didn't worked, it only did for the first element of the array.
Give a look at this to check out: http://www.2shared.com/file/cHZEttV0/sumif_then_subtract__for_gewil.html
SUMIFS hasn't the ability to perform matrix-based operations, so... welcome back to the SUMPRODUCT, and it did the job.
Hope I helped you.
Regards!
@Cameron: Sorry, 2shared still with problems. Here's in skydrive.
https://skydrive.live.com/?cid=3a8bdc8cdf4d772c#!/view.aspx?cid=3A8BDC8CDF4D772C&resid=3A8BDC8CDF4D772C!269
It seems sumifs does not quite work as I expected!
I should say that SUMPRODUCT does quite well the work, more than I expected, to be honest.
@All, SirJB7, Cameron
There are slight differences between how Sumproduct and SumIf and SumIfs work
If you look at the Excel help for each function, specifically the Syntax section
SUMPRODUCT(array1, [array2], [array3], ...)
SUMIF(range, criteria, [sum_range])
SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2, criteria2], ...)
You can see that Sumproduct requires Arrays as Inputs
SumIf/s require ranges
This is telling us that an Array can be a Range, but that a Range can't be an Array
It also explains why you don't need to use Ctrl Shift Enter with Sumproduct as it is accepting the input as an Array and is treated as such by default.
It is nice to know that both solutions (using sumif or sumproduct) are still backwards compatible with earlier versions of Excel.
I think this is cool and it's good to understand the array notation in the formula.
For my money, I prefer to expose the list {"PD”;”FA”;”PS”;”PN”} plainly in the worksheet, and reference the named range instead of embedding the list in formulae. This ensures consistency, which is especially important if the list is used in multiple places and might ever need to be changed. Implementation is simple:
Somewhere in the workbook type out this column of values:
Transactions_To_Include
PD
FA
PS
PN
Highlight the range and hit Ctrl+Shift+F3 to create a named range with the default "name in top row" option.
Use new formula
=176 - SUMPRODUCT(((B$10:I$10)=(Transactions_To_Include)) *(B$11:I$11))
I will post another solution that has even better transparency from an audit standpoint.
Assuming helper rows are allowed, I would set up the named range as I previously posted. Add a helper formula in B12:
=--IFERROR(MATCH(B$10,Transactions_To_Include,0)>0,0)
and extend to the right.
Then use this new formula to calculate the conditional totals:
=176-SUMPRODUCT(B$11:I$11,B$12:I$12)
or
=176-SUMIF(B$12:I$12,1,B$11:I$11)
The reason I recommend this approach is it obvious which columns are being summed in the total.
It is good to keep in mind that others may need to parse your workbooks, or that you may need to understand your own logic months or years later. Using fancy formulae to do everything in one shot can be fun, but it can also add to the risk that mistakes will be made and makes it more difficult to understand the logic and flow. Breaking down the logic in steps is like "showing your work" -- just as important now as it was back in school.
Final missive... Just to be clear, I'm not trying to bash SirJB7's approach -- like I said, I do think it's cool, and it presented a useful technique that is far superior to the OP's.
My comments are driven by experience. Over the years I have learned (usually the hard way) that keeping things simple and transparent helps ensure the solution will be accurate and sustainable.
Using tables to drive business rules is a major point I'm trying to make here. Any programmer confronted with the option to "hard code" a list of values vs. look up from a table knows what I mean.
@Andy - thanks for that. Your approach makes it easier to update if parameters change. I tried using the table with my SUMIF() approach, but unfortunately it didn't work. So, the SUMPRODUCZ() in conjunction with a Lookup Table is more ideal.
FYI
In my second post that offered a SUMIF approach, this formula was not rendered correctly by the blog engine:
=[minus][minus]IFERROR(MATCH(B$10,Transactions_To_Include,0)>0,0)
where [minus][minus] means type the - symbol twice.
Hi,
Another way to simplify the original formula is as follows:
Setup the comparison values ("PD”;”FA”;”PS”;”PN”;”F1?;”F7?) somewhere. Say, A1:A6.
Then, you can use the following formula:
=176 - SUMPRODUCT((COUNTIF(A1:A6,B10:AF10)=1)*(B11:AF11))
One advantage of this approach is that you can edit the values in A1:A6 as needed. (You could also use a Named Range, ensuring that your formula would not need to be modified even if the comparison values grow or shrink.)
Cheers,
Sajan.
Will a time come when some of these masterpieces need to be revisited with modern Office 365 in mind?
For example, the array formula
= 176 - SUM( SUMIFS( values, codes, {"PD","FA","PS","PN"} ) )
works fine and CSE has largely gone.
@Peter
You are of course correct, that a number of new functions will make older solutions obsolete.
The only problem is that I know a lot of people still use Excel XP or 2007, from 2002/2007 era. As such these new functions don't exist.
@Hui
As you say, it will be many years before these solutions truly become obsolete. The challenge is that we now seem to have reached a fork in Excel development in which at least two radically different solutions are available for any given problem, depending on the version of Excel that is in use.
Of course, there is also the question of which is harder to upgrade, the code or the user (but perhaps that is a point better left unspoken!).
@Peter
I think it is much worse than that
I don't believe Microsoft have any intent of updating Excel 2019 etc to keep up with Excel365
XLookup comes to mind
This will create a large divide