Formula Forensics No. 010 Count How Many Times a List of Values Occurs in a Range

Posted on February 1st, 2012 in Excel Howtos , Formula Forensics , Huis , Posts by Hui - 17 comments

A few weeks ago Gewilson asked on the Chandoo.org Forums, “Can I simplify my formula?

=176 – (SUMIF($B10:$AF10,”PD”,$B11:$AF11) +SUMIF($B10:$AF10,”FA”,$B11:$AF11) +SUMIF($B10:$AF10,”PS”,$B11:$AF11) +SUMIF($B10:$AF10,”PN”,$B11:$AF11) +SUMIF($B10:$AF10,”F1″,$B11:$AF11) +SUMIF($B10:$AF10,”P1″,$B11:$AF11) +SUMIF($B10:$AF10,”F7″,$B11:$AF11))

SirJB7 responded with a nice Sumproduct solution:

=176 – SUMPRODUCT(((B$10:AF$10)=({“PD”;”FA”;”PS”;”PN”;”F1″;”F7″})) *(B$11:AF$11))

So Today we will pull this apart to see what inside, I think what we find may surprise you.

 

SirJB7’s Formula

=176 – SUMPRODUCT(((B$10:AF$10)=({“PD”;”FA”;”PS”;”PN”;”F1″;”F7″})) *(B$11:AF$11))

To Simplify things I am going to use a Truncated set of data and adjust the formula accordingly

We will examine:

=176 – SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))

This problem has a smaller Range B10:I10 instead of B10:AF10

as well as 2 less possible solutions {“PD”;”FA”;”PS”;”PN”} instead of {“PD”;”FA”;”PS”;”PN”;”F1″;”F7″}

The reason for this will soon become evident.

As usual you can download a Sample File to follow along with. Download Here.

Lets go:

=176 – SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))

We can see above that the formula is subtracting the result of a Sumproduct from a Fixed Number 176. So we really only need to focus on the Sumproduct part of the formula.

As we saw In Formula Forensics 007 – Sumproduct, Sumproduct adds up the products of the constituent arrays.

In this case

SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”;})) *(B$11:I$11))

Has only 1 constituent array. The array does consist of 2 components

SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))

These is a Logic component ((B$10:AF$10)=({“PD”;”FA”;”PS”;”PN”}))

and a Numerical Component (B$11:I$11)

Which are then multiplied together.

 

Looking at the Logical Component first

((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”}))

The formula is checking the Range B10:I10 against an Array of possible solutions {“PD”;”FA”;”PS”;”PN”}

That is, it is checking each value in our list {“PD”;”FA”;”PS”;”PN”}, against each cell in the range B10:I10.

If we type the above equation=((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) into a spare cell C14, and press F9

Excel returns ={FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}

What the …

If we look closely at the above array we will see that it contains a lot of True/Falses separated by ,’s and a few ;’s

={FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}

Specifically there are 4 blocks of 8 True/Falses separated by ,’s, each block is separated by a ;

In Total 4 x 8 = 32 Values

What this is, is an array representing the multiplication of the 8 cells in the range B10:I10 with each element of the possible solution array

Each row of the Array is separated from the next by a ;

Each element in each row is separated by a ,

This is best displayed like:

You can see why I simplified the size of the original problem.

So we have an Array of True/Falses ={FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}

Which is now multiplied by the next component of the Sumproduct

({FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}) *(B$11:I$11)

In a spare cell, say C23 enter

=({FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}) *(B$11:I$11) and press F9

Excel returns

={0,0,0,0,10,0,0,0;0,10,0,0,0,0,0,0;0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0}

Note that this array has the same 8 column x 4 Row layout as above, except that all the True have been replaced by the values in the Score cells B11:I11

Sumproduct now kicks in and adds these up

=Sumproduct({0,0,0,0,10,0,0,0;0,10,0,0,0,0,0,0;0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0})

To get 20

Which is subtracted from our original number

=176 – SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))

= 176 – Sumproduct({0,0,0,0,10,0,0,0;0,10,0,0,0,0,0,0;0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0})

= 176 – 20

= 156

 

Download

You can download a copy of the above file and follow along, Download Here.

 

Other Posts In This Series

You can learn more about how to pull Excel Formulas apart in the following posts

Formula Forensic Series:

 

We Need Your Help

I have received a few more ideas since last week and these will feature in coming weeks.

I do need more ideas though and so I need your help.

If you have a neat formula that you would like to share and explain, try putting pen to paper and draft up a Post like above or;

If you have a formula that you would like explained but don’t want to write a post also send it in to Chandoo or Hui.

 

Your email address is safe with us. Our policies

Written by Hui...
Tags: , , , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

17 Responses to “Formula Forensics No. 010 Count How Many Times a List of Values Occurs in a Range”

  1. Unai says:

    I thought that as we only had to sum one array we could have used SUM instead of SUMPRODUCT, but I have had to enter array formula to get the same result. Why is it?

    • Cameron says:

      Well, if you only had one value to check for, you would use a SUMIF instead of SUM, but the issue is that the problem requires a check for 6 values. What the formula is doing is taking the first array (Your list of existing values) and creating a new array of TRUE/FALSE for each test value. Each of those new arrays is separated by the semicolons.

      What I would truly like to know, is why SUMIFS would not work for this solution instead? I can’t see the original forum thread, so perhaps it must be done in 2003.

  2. Luke M says:

    Linky at the top wasn’t working for me, I think the “replies=6″ portion was messing it up. If anyone else is interested:
    http://chandoo.org/forums/topic/sumif-then-subtract#post-16283

  3. Nick Burns says:

    Here’s an ever so slightly shorter formula (in terms of number of characters) – going back to the use of the sumif():

    =176-SUM(SUMIF(B$10:I$10,{“PD”;”FA”;”PS”;”PN”;”F1″;”F7″},B$11:I$11))

    • SirJB7 says:

      Hi, Nick Burns!
      You’re right!
      I wouldn’t bet that that with SUMIF would work when it didn’t with SUMIFS.
      Very nice approach!
      Regards!

  4. SirJB7 says:

    @Luke M: Thanks for adding the correct link, it didn’t worked for me neither.

    @Cameron:
    Hi!
    I firstly tried to solve it using SUMIFS instead of SUMPRODUCT, but it didn’t worked, it only did for the first element of the array.
    Give a look at this to check out: http://www.2shared.com/file/cHZEttV0/sumif_then_subtract__for_gewil.html
    SUMIFS hasn’t the ability to perform matrix-based operations, so… welcome back to the SUMPRODUCT, and it did the job.
    Hope I helped you.
    Regards!

  5. Hui... says:

    @All, SirJB7, Cameron
    There are slight differences between how Sumproduct and SumIf and SumIfs work
    If you look at the Excel help for each function, specifically the Syntax section

    SUMPRODUCT(array1, [array2], [array3], …)
    SUMIF(range, criteria, [sum_range])
    SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2, criteria2], …)

    You can see that Sumproduct requires Arrays as Inputs
    SumIf/s require ranges

    This is telling us that an Array can be a Range, but that a Range can’t be an Array
    It also explains why you don’t need to use Ctrl Shift Enter with Sumproduct as it is accepting the input as an Array and is treated as such by default.

  6. Nick Burns says:

    It is nice to know that both solutions (using sumif or sumproduct) are still backwards compatible with earlier versions of Excel.

  7. andy holaday says:

    I think this is cool and it’s good to understand the array notation in the formula.
    For my money, I prefer to expose the list {“PD”;”FA”;”PS”;”PN”} plainly in the worksheet, and reference the named range instead of embedding the list in formulae. This ensures consistency, which is especially important if the list is used in multiple places and might ever need to be changed. Implementation is simple:
    Somewhere in the workbook type out this column of values:
    Transactions_To_Include
    PD
    FA
    PS
    PN

    Highlight the range and hit Ctrl+Shift+F3 to create a named range with the default “name in top row” option.

    Use new formula
    =176 – SUMPRODUCT(((B$10:I$10)=(Transactions_To_Include)) *(B$11:I$11))

    I will post another solution that has even better transparency from an audit standpoint.

  8. andy holaday says:

    Assuming helper rows are allowed, I would set up the named range as I previously posted. Add a helper formula in B12:
    =–IFERROR(MATCH(B$10,Transactions_To_Include,0)>0,0)
    and extend to the right.

    Then use this new formula to calculate the conditional totals:
    =176-SUMPRODUCT(B$11:I$11,B$12:I$12)
    or
    =176-SUMIF(B$12:I$12,1,B$11:I$11)

    The reason I recommend this approach is it obvious which columns are being summed in the total.

    It is good to keep in mind that others may need to parse your workbooks, or that you may need to understand your own logic months or years later. Using fancy formulae to do everything in one shot can be fun, but it can also add to the risk that mistakes will be made and makes it more difficult to understand the logic and flow. Breaking down the logic in steps is like “showing your work” — just as important now as it was back in school.

  9. andy holaday says:

    Final missive… Just to be clear, I’m not trying to bash SirJB7’s approach — like I said, I do think it’s cool, and it presented a useful technique that is far superior to the OP’s.
    My comments are driven by experience. Over the years I have learned (usually the hard way) that keeping things simple and transparent helps ensure the solution will be accurate and sustainable.
    Using tables to drive business rules is a major point I’m trying to make here. Any programmer confronted with the option to “hard code” a list of values vs. look up from a table knows what I mean.

  10. Nick Burns says:

    @Andy – thanks for that. Your approach makes it easier to update if parameters change. I tried using the table with my SUMIF() approach, but unfortunately it didn’t work. So, the SUMPRODUCZ() in conjunction with a Lookup Table is more ideal.

  11. andy holaday says:

    FYI

    In my second post that offered a SUMIF approach, this formula was not rendered correctly by the blog engine:

    =[minus][minus]IFERROR(MATCH(B$10,Transactions_To_Include,0)>0,0)

    where [minus][minus] means type the – symbol twice.

  12. Sajan says:

    Hi,
    Another way to simplify the original formula is as follows:

    Setup the comparison values (“PD”;”FA”;”PS”;”PN”;”F1?;”F7?) somewhere.  Say, A1:A6.

    Then, you can use the following formula:
    =176 – SUMPRODUCT((COUNTIF(A1:A6,B10:AF10)=1)*(B11:AF11))

    One advantage of this approach is that you can edit the values in A1:A6 as needed.  (You could also use a Named Range, ensuring that your formula would not need to be modified even if the comparison values grow or shrink.)

    Cheers,
    Sajan.

Leave a Reply