Formula Forensics 025. Count Unique Values in a Range

Posted on July 26th, 2012 in Formula Forensics , Huis , Posts by Hui , wonder why - 22 comments

This week at the Chandoo.org Forums, Ajinka asked a question about counting unique values in a range.

Faseeh answered with a neat Sumproduct() based formula and quoting a post that Chandoo had written at Chandoo.org answering the question in 2009.

A few people asked how it worked and Luke M gave a good response which I will be plagiarising in part here.

Faseeh’s formula was =SUMPRODUCT(1/COUNTIF(B2:B8,B2:B8))

As always at Formula Forensics you can follow along using a Worked Example which you can download here: Excel 97-2013.

 

Count Unique Values

Faseeh’s formula was =SUMPRODUCT(1/COUNTIF(B2:B8,B2:B8))

So lets look at how that works

=SUMPRODUCT(1/COUNTIF(B2:B8,B2:B8))

The formula is a Sumproduct() based formula which tells us that the Sumproduct() function is being used to multiply and addup the component arrays. As there is only 1 array component in our formula, Sumproduct simply adds up the values. You can learn more about the Excel Sumproduct function here: Formula Forensics 007

The components of the Sumproduct() function are:

1/COUNTIF(B2:B8,B2:B8)

Lets start with the COUNTIF(B2:B8,B2:B8) part

In a blank cell F12 put =COUNTIF(B2:B8,B2:B8), press F9 instead of Enter
Excel will respond with ={3;1;2;3;3;2;1}

What is Countif() doing ?

The Syntax of Countif() is:

In our example COUNTIF(B2:B8,B2:B8) the Range and the Criteria are the same Range B2:B8
So Countif will Look at the Range (B2:B8) and see what matches the criteria in each cell in the Criteria Range (B2:B8), 1 cell at a time.

Lets look at the first few cells in the Criteria and work through them.

The first cell in the Criteria is B2 which contains “ABC”
We can see that the Range contains the first value in the criteria “ABC”, 3 times
This is the first 3 in the Array shown above ={3;1;2;3;3;2;1}

 

The second cell in the Criteria is B3 which contains “XYZ”
We can see that the Range contains the second value in the criteria “XYZ”, 1 times
This is the second element in the Array shown above ={3;1;2;3;3;2;1}

 

The third cell in the Criteria is B4 which contains “HML”
We can see that the Range contains the third value in the criteria “HML”, 2 times
This is the third element in the Array shown above ={3;1;2;3;3;2;1}

 

Stepping through the range and comparing each value in the criteria results in: ={3;1;2;3;3;2;1}

 

Reciprocal

The next part of the formula is the
1/COUNTIF(B2:B8,B2:B8)

This takes the reciprocal of our Array {3;1;2;3;3;2;1}

In a Blank cell F14 enter =1/COUNTIF(B2:B8,B2:B8) press F9 not Enter
Excel returns: ={0.333;1;0.5;0.333;0.333;0.5;1}  (I have truncated the 0.33333333333 values to save space)
Which is the same as {1/3; 1/1; 1/2; 1/3; 1/3; 1/2; 1/1}

The Sumproduct() function now steps in and adds up the values of the array returning the answer 4.

Summary

So generically if a value occurs T times in the range, it will occur T times in the criteria.

This will return the value T, T times. The smart bit here is taking the reciprocal of the Count.
So this means it will return the value T, 1/T times.

So ultimately T x (1/T) = 1.

You can see from the above it doesn’t matter how many times a value occurs, every unique value will be seen as 1 and then added up by Sumproduct

 

Download

You can download a copy of the above file and follow along, Download Here Excel 97-2013.

Formula Forensics “The Series”

This is the 25th or Silver Anniversary Post in the Formula Forensics series and was the first Formula Forensics completely developed in the new Office 2013.
You can learn more about how to pull Excel Formulas apart in the following posts
Formula Forensic Series

Formula Forensics Needs Your Help

I need more ideas for future Formula Forensics posts and so I need your help.
If you have a neat formula that you would like to share as Jong has done above, try putting pen to paper and draft up a Post like above or;
If you have a formula that you would like explained, but don’t want to write a post, send it to Hui or Chandoo.

Written by Hui...
Tags: ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

22 Responses to “Formula Forensics 025. Count Unique Values in a Range”

  1. Dohsan says:

    I saw this in the original thread and thought it was very cheeky little formula! One thing I did notice though was that it will return #DIV/0! error if any blanks are present in the list.

    I played around and managed to get the following to work

    {=SUMPRODUCT(IF(A1:A10<>"",1/COUNTIF(A1:A10,A1:A10),0)) }

    Only issue is that it then becomes an array formula.

    I then stumbled across this on another forum, which doesn't need to be entered as an array

    =SUMPRODUCT((A1:A10<>"")/COUNTIF(A1:A10,A1:A10&""))

    Interesting behaviour of COUNTIF, where a blank will return 1 if you were to use COUNTIF(A1,A1&"")

    Original thread was here, explains it far better than I could attempt:

    http://www.mrexcel.com/forum/showthread.php?70835-Count-distinct-function  

    • Oscar says:

      The formula =SUMPRODUCT((A1:A10<>”")/COUNTIF(A1:A10,A1:A10&”")) is really interesting!

      You can use the same technique to extract unique distinct values if blanks are present in the list!
      Array formula in cell B2:
      =INDEX($A$1:$A$10, MATCH(0, COUNTIF($B$1:B1,  $A$1:$A$10&""),0))
      Copy cell B2 and paste down as far as needed.

       

  2. Jon says:

    That just blew my mind!!!!!

  3. Kevin says:

    Why do people hate the DCOUNT function.

    • Luke M says:

      What makes you say that Kevin? DCOUNT is for use with a database (which requires headers) and is also limited to just counting numbers (not text). This article is about counting unique values. 

      • Kevin says:

        DCOUNTA counts text. Isn't Cells A1. C8 a database.

        • Jeff Weir says:

          There's more than one way to skin a cat, as the saying goes. Each with advantages and disadvandages.
          The 'D' functions are definately underused, but because they need to reference a criteria range stored in the spreadsheet somewhere, you can't easily pop in a D function somewhere and then fill down in the same way as you can with other functions. The reason is because each D function occupies one row, but the criteria they reference must occupy at least 2 rows. Hard to visualise, harder still to explain. And while there are workarounds to this, they require quite a bit of trickery.
           
           So D functions tend to be more helpful if you're only interested in answering one question based on just one criteria as in this case. If you need lots of formulas to answer questions based on a whole heap of criteria, they are not so helpful.
           
          I'm going to quote from my clever pal Sam:
          The "D" Functions (DSUM, DCOUNT etc) in Excel are often underused.
          A common complaint among users is that you cant drag it down.... Well you can ...but it's tricky. 
           
          D function advantages
          1.Speed - Faster than SP (but slower than SUMIF/s and CountIF/s)

          2.Can handle both AND and OR like SP

          3.Can handle Wild cards *,? directly
          4. Can be dragged down like SP when used along with the SUM function as DSUM-SUM
          5. Can handle multiple OR criteria in a much better way than SP
           
          Disadvantages
          1. The default operator is String* and not Equals (Like in Advance Filter)
          2. If you want to use the Equals operator - it has to be explicitly specified like '=String
          3. Cant handle row and column criteria
          4. Not as Flexible as SP
          5. Cant Replicate COUNTIF(s)(Rng,Rng)

  4. David K says:

    =IF(SUMPRODUCT(($A$2:$A2=A2)*($C$2:$C2=C2))>1,0,1)
    Drag down to expand the Range
    I use this formula for 2 variables uniqueness.
    Col A has is a Company and Col C is who is assigned to company.
    This allows me to quickly Sum up the number of Unique Reps assigned to a Company.
     
    *I forgot the source where I got this from but I'll attach it later if I find it.
     

  5. shrivallabha says:

    You can also use:
    =SUM((FREQUENCY(B2:B8,B2:B8)>0)+0)
    for numerical data
    and
    =SUM((FREQUENCY(MATCH(B2:B8,B2:B8,0),MATCH(B2:B8,B2:B8,0))>0)+0)
    for non-numerical data!
    MATCH needs to be added to non-numerical cases due to the fact that FREQUENCY works with numbers only.

  6. Noah Helenihi says:

    Awsesome, as usual! But, what if you want to then list the values? Say, you want to identify all of the unique values so you can display counts by unique values or percentage of transactions to unique values . . . Is there a simple formula you can use to pull these unique values out of the range and then list them on the side (likely an array formula)?

    • Dohsan says:

      Hi Noah,

      If you want a distinct list (i.e. show all the values in a list once)

      then you can use the following:

      =IFERROR(INDEX(List,MATCH(0,IF(ISBLANK(List),"",COUNTIF($B$1:B1,List)),0)),"")

      Assuming your data is in column A setup a named ranged called List and then enter this in B2, enter as an array and drag down. 

      If you want a unique list (Only list the values that appear once in the list) then you can use:

      =IFERROR(INDEX(List, MATCH(0, COUNTIF($C$1:C1, List)+(COUNTIF(List, List)<>1), 0)),"")

      Put that in C2 as an array and drag down 

  7. Noah Helenihi says:

    One more question . . . what are the pros/cons of using the "Countif" formulation versus a "Sum/Frequency" formulation to do the same thing? And, why don't they just create a "CountUnique" formula?

  8. Jeff Weir says:

    Great formula, great explanation. Well done, Hui.

  9. sam says:

    @Jeff.

    6. The D Functions handle Formulas in the Criteria (just like Adv Filter)
    The Rules for Formula criteria remain the same as Advanced Filters.

    1. The Headings of the Criteria should be different from the Data
    2. Some where in the formula you must refer to the first cell in the Column you want to SUM/COUNT etc. That reference should be relative. All other range references should be absolute.

     

  10. ahamed says:

    First I would like to take this opportunity to thank Chandoo.org for bringing up this post. 
    Another issue that I have always been facing for long time is to find a simple formula to extract those unique items (parties) into different cells. 
    Now we know according to the given formula we have 4 unique parties, how can we extract these unique parties into different cells from the list. (Ex: ABC,XYZ,HML,NKY)
    During the course of developing a dash boards I frequently face these kinds of issues where I have a huge set of data of which certain items repeat.
    Now I want to find and list out in separate cells these unique items and their totals, from which the "items and values" are again taken into the dash board on rank basis (ex- top 5 etc)
    Pls advice me

  11. David Hager says:

    For those who might be interested in what was written on this subject 13 years ago see:

      http://www.j-walk.com/ss/excel/eee/eee004.txt

  12. Jordan says:

    Hi, I have been able to make this formula work, but! i have an added twist that i'm having difficulty with. In my case, i would like to find the unique value whether a certain criteria is found in 2 different columns. I think of it instead of look at column 1 and 2 to see if there are unique volumes in column 3, i would like it two say column 1 or 2 meets a criteria make sure column 3 is unique.

    I hope this explains properly.

    One more try, Im looking up unique values for 2 whether found in column B or C. Therefore the count should come back only 1:

    Jan  2  3
    Jan 3   2
    Jan  2  3

    Thanks!
     

  13. Alfredo says:

    I have to process a list every day, it does not have blank values but instead its length changes every time; the staff wants to see that list on top of the report and then the summary so I have a fixed amount of rows at top and then just hide what any blank rows after pasting the data from an external DB.

    I am using another formula to count unique values (I will use range A1:A200) as an example:

    {=SUM(IF(FREQUENCY(IF(LEN(A2:A200)>0,MATCH(A2:A200,A2:A200,0),""),IF(LEN(A2:A200)>0,MATCH(A2:A200,A2:A200,0),""))>0,1))}

    I like the formula posted here as it's simpler but it will not take the blank rows at the bottom without throwing an error, so what I did is to use the dynamic range approach (that I also learned here at chandoo.org) and it looks like this:

    =SUMPRODUCT(1/COUNTIF(INDIRECT("A2:A"&COUNTA(A2:A200)),INDIRECT("A2:A"&COUNTA(A2:A200))))

    Next I will like filter the list to count (unique values that meet certain criteria in other columns) I will appreciate any ideas 🙂

Leave a Reply