Using Array Formulas in Excel – Find if a list has duplicate items

Posted on March 25th, 2009 in Excel Howtos , Learn Excel - 39 comments

Find a list is unique or not using array formulas in excelToday I have learned this very cool way to find if a list has duplicate items or not.

This technique uses array formulas (do not shudder, believe me they are not as difficult as you may think)

First the formula

Assuming your list is in the range, C3: C9, the array formula to find if a list has duplicate items or not is,

=IF(MAX(COUNTIF(C$3:C$9;C3:C9))>1,"list is not unique","list is unique")

Now the explanation

How do you know if a list has no duplicates? Simple, we find the number of times each item has appeared in the list and see if any of those counts are more than 1.

Now, take a look at the formula. It says find the maximum of individual item counts using countif (learn excel countif function) and if the maximum is more than 1, then the list has duplicates, otherwise it is unique.

But…

Yes, entering the formula will not work by itself. You have to make it array formula.

How do you do that?

Oh, that is simple, you just take the excel spreadsheet and whack it until it turns blue.

well, not really. all you need to do is enter the formula and press CTRL + SHIFT ENTER instead of just pressing enter.

that way excel converts your formula to array formula and the COUNTIF(C$3:C$9,C3:C9) will return an array of counts instead of one value. Now you can also guess why we have absolute reference for one parameter of countif () and relative reference for another. Learn more about Absolute and Relative References in excel formulas.

More on Finding and Removing Duplicate Items

> Using pivot tables to get unique items in excel
> Getting unique items using data filter and formulas
> Use advanced data filters to find unique items
> Eliminate Duplicate Entries in a List using Formulas
> Get Unique items using Excel 2007 built in features

This post is part of our Spreadcheats series, a 30 day online excel training program for office goers and spreadsheet users. Join today.

Your email address is safe with us. Our policies

Written by Chandoo
Tags: , , , , , , , , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

39 Responses to “Using Array Formulas in Excel – Find if a list has duplicate items”

  1. Denise says:

    Hello Mr Chandoo!
    Could you double check the formula you have provided. I am getting an error – Excel will not accept the formula =IF(MAX(COUNTIF(C$3:C$9;C3:C9))>1;”list is not unique”;”list is unique”) – It seems to be having a problem with C$9 part of the formula.
    Many thanks PHD. Appreciate the time you give to this website.
    denise

  2. Robert says:

    Denise,

    I guess Chandoo is using an European installation of Excel at the moment. If you are using an English version, you have to replace the “;” by “,”.

  3. Denise says:

    Ah – Thank you Robert. That’s very kind of you to reply so promptly. And you are right – it now works with “,”
    Cheers Robert. All the best to you.
    Denise

    • Chandoo says:

      @Denise, as Robert pointed, I have been using European version of excel which uses ; to seperate formula arguments instead of comma. So when I copy pasted the formula from my test workbook, I forgot to change the semicolon back to comma. I have corrected the post now.

      @Robert: Wonderful guess and thanks :)

      • Cindy says:

        Chandoo,

        Thanks for a great blog. I have gone through all the posts on duplicate but the issue I have is that the duplicates are not named in the same way. For example:

        ADT Security 
        A D T Security

        are both the same but was entered differently (one with space and one without space). How do I go about this? I have 8000 rows of data I need to identify duplicates that are mostly nonidentical.
        Thanks for your help!
         

  4. JP says:

    Here’s how I do it:

    =IF(COUNTA(A1:C100)=SUM(1/COUNTIF(A1:C100,A1:C100)),”No duplicates”,”Some duplicates”)

    Also, you can change your formula separator (in WinXP) by going to Control Panel, Regional and Language Optionsand click “Customize” near the top of the dialog box. Then you could change your list separator to ; and use Chandoo’s formula without alteration. But I wouldn’t recommend it ;)

  5. AndyH says:

    @ JP, true, there is often more than one way to get at a solution.

    Arrays can be very a powerful though, and can often overcome shortcomings in other methods. Array formulae can do some truly amazing things and deserve much more press.

    For example, suppose you have the following and you want the median Num of “A” Codes:

    Code Num
    A 6
    B 10
    C 12
    A 16
    B 32
    C 15

    An array takes care of this easily, and provides a solution where other methods do not exist:

    =MEDIAN(IF(A2:A7=”A”,B2:B7))

    The formula must be array-entered (Ctrl+Shift+Enter) to work properly, of course.

    If we wanted Average we could use pivot tables. If we wanted Sum we could use a pivot or SUMIF. But for Median, there is no other convenient way. Moreover, to get Average or Sum, simply substitute those functions for MEDIAN and it is done.

    • AshP says:

      HI Chandoo,
      I have seen most of your blogs and has helped me in resolving most of excel formulas issues.
      However, I have one issue which is bothering me from quiet a long time and I’m not getting a solution.
      I’m counting on you and your blog members to help me resolving this.

      What I Have : I have 2 columns. First with unique names and second column have numbers associated with the names (I’m taking example as above).
      What I want : I need to highlight the Max Value (Second Column) against each Individual names.

      Code Num
      A 6
      B 10
      C 12
      A 16
      B 32
      C 15

      Output Should be :
      A 16
      B 32
      C 15

  6. azmat says:

    well being an infant in excel knowledge i like to keep things simple so here is my simple formula

    =IF(COUNTIF($A$1:$A$10,A1)>1,”duplicate”,”no duplicate”)

  7. Ash says:

    Can also Copy the column onto a new sheet, Pivot Table the column with the column on left side (row) and the “Count” of the column in middle (Data). Any duplicate row will be over 1 (can sort decreasing to see them in 1st)

  8. Rob says:

    I am trying to apply the principles of arrays included in this post to analyze the following real-world problem, but I have had no success. I’m wondering if someone can provide some guidance:

    The goal is to count the number of unique (non duplicate) customers in any one month. The data looks something like this:

    Customer Month Sale ($)
    John apr 20
    Brad apr 25
    Ellen may 15
    Ron jun 20
    Toby jun 10
    Toby jun 10
    Juan jun 25
    Brad jul 30

    So for Apr the count should be 2 (Jon & Brad), for Jun the count should be 3 (Ron, Toby & Juan), and for Jul the count should be 1 (Brad).

    I’ve been able to count unique customers (non-duplicates) in the entire list, but not unique customers by month.

    Any help would be greatly appreciated.

    -Rob

  9. Chandoo says:

    @Rob: Welcome to PHD and thanks for asking a question.

    I like this problem very much, so when I got home from work I have spent sometime researching array formulas (and ended up learning a ton of cool stuff).

    To count the month-wise uniques, you can use offset() formula. Ofcourse, this only works if the list is arranged month-wise. But probably that is not such a difficult task.

    I have made one solution that works with offset and array formulas and uploaded it here:

    http://chandoo.org/img/n/array-formulas-query.xls

    Let me know if this doesnt work for you…

  10. Andy Holaday says:

    This is a good problem and I happen to have a solution in my bag-o-tricks called “count unique items with constraint”.

    Assume your data in A2:C9, and A12=”apr”, A13=”may”, etc.

    This formula in B12 will return result for “apr”. It is an array formula that can be filled down:
    =SUM(N(FREQUENCY(IF($B$2:$B$9=
    A12,MATCH($A$2:$A$9,$A$2:$A$9,0)),MATCH($A$2:$A$9,$A$2:$A$9,0))>0))

    The IF specifies the constraint. If you remove it you will get unique customers without regard to date.
    =SUM(N(FREQUENCY(MATCH($A$2:$A$9,$A$2:$A$9,0),
    MATCH($A$2:$A$9,$A$2:$A$9,0))>0))

    These were highly instructional to me. If you use the “Evaluate Formula” tool and follow the evaluation steps very carefully you can learn lots about how array formulas work.

  11. Rob says:

    Hi Andy –

    Thanks for the reply. I’d like to explore your solution further but I believe some of the formula was cut-off when your reply was posted (possibly a text wrap issue). is there any chance you can re-post the formula?

    -Rob

  12. Andy Holaday says:

    Hi Rob, the rendering by the forum is unfortunate, but if you select my entire reply using the mouse and copy / paste to another viewer (e.g., notepad) the whole answer should be there.

    Aside question to PHD… is there a better way to post long lines of code?

    • Chandoo says:

      @Andy.. good formula.. I havent tested it yet…
      my formula looked something like this: =SUM(1/COUNTIF(OFFSET($B$3,MATCH(F3,$C$3:$C$10,0)-1,0,COUNTIF($C$3:$C$10,F3)),
      OFFSET($B$3,MATCH(F3,$C$3:$C$10,0)-1,0,COUNTIF($C$3:$C$10,F3))))

      Also, to get unique count without any other conditions you can try the sum(1/countif(range, range))

      here F3 has the month APR, C3:C10 has the months and B3 onwards the people list

      Btw, you can enter long formulas by manually inserting breaks. This theme doesnt support super long text nor it does insert line breaks automatically. If it doesnt chop then the layout is screwed.

  13. Kelly says:

    @Chandoo, thank you very much for your explanation about array formulas.

    I downloaded the array-formulas-query.xls spreadsheet and I’m trying to understand how you got the single cell reference to increment in the array formula. I must be missing something simple.

    Whenever I try to type in an array formula and use the Ctrl+Shift+Enter to enter it, it puts the formula in as an array formula, but it repeats the same value for the cell reference in every position of the array.

    In other words if I try to enter just a portion of your formula above as =COUNTIF($C$3:$C$10,F3), it puts the exact same formula in every position of the array. Thus in your sample spreadsheet I get the count for “apr” in every row of the array.

  14. Chandoo says:

    @Kelly.. array formulas expect that you use a regular formula but with arrays instead of one value. the countif syntax is countif(range, filter-criteria). So when you enter something like this: =COUNTIF($C$3:$C$10,F3), it still works like a normal formula even though you enter it as an array formula.

    where as this formula: =COUNTIF($C$3:$C$10,F3:F10) works like an array formula as excel calculates 8 different values (=COUNTIF($C$3:$C$10,F3),=COUNTIF($C$3:$C$10,F4), …, =COUNTIF($C$3:$C$10,F10)) when you press ctrl+shift+enter

    I hope this is a bit clear now.. I suggest you to see more array formula examples to understand how they work.
    http://chandoo.org/wp/tag/array-formulas/

  15. Kelly says:

    Thank you for the clarification. Your explanations made sense so I took another look at your array-formulas-query.xls spreadsheet. I then realized that you put in four individual array formulas under the Unique customers column. I initially thought that they were a single array.

    Thanks again for your excellent examples an explanations in the whole PHD site. I have learned quite a few things from your articles.

  16. cybpsych says:

    hi. i got to resurrect this cuz it’s driving me crazy :D

    assuming data is in A2:A10. target unique list is on B2.

    i manage to get the unique list using this array formula: {=IFERROR(INDEX(A2:A10,MATCH(0,COUNTIF($B$2:B2,A2:A10),0)),””)} … but the list isn’t sorted.

    how can i get the unique list from A2:A10 -> B2, and sorted (ascending)?

    thanks!

  17. Nigel Lovell says:

    Chandoo
    Think I found a minor error on the page:-
    http://chandoo.org/wp/2009/03/25/using-array-formulas-example1/

    you used ; in the middle of
    =IF(MAX(COUNTIF(C$3:C$9;C3:C9))>1,”list is not unique”,”list is unique”)
    where my Excel was only happy with ,
    Then it worked.
    Keep up the good work. GREAT Site.
    Nigel

  18. Chandoo says:

    @Nigel: I have a european excel version installed on my comp. It uses ; as the separator in formulas. I often forget to replace the ;s in the formulas with ,s when pasting them here. :D

    Thanks for your compliments.

    @Cybpsych: Are you able to solve the problem. Let me know.

  19. [...] It is no wonder that I have written extensively about it (here: 1, 2, 3, 4, 5, 6, 7, 8). [...]

  20. yashwant says:

    i needs formula for duplicate in excel.

  21. Girish says:

    Let’s assume that we have data on two columns ie. B and C. The data may repeat on the same column but should not on both columns. Our data range for this example is “B7:C19″. The formula to use in excel 2007 is:
    =IF(COUNTIFS($B$7:$B$19,B7,$C$7:$C$19,C7)>1,”Duplicate”,””). Use this formula in D7 and copy to the row range in column D.

    The formula to use in versions before 2007 is:
    =IF(MIN(COUNTIF($B$7:$B$19,B7),COUNTIF($C$7:$C$19,C7))>1,”Duplicate”,””). Use this formula in D7 and copy to the row range in column D.

    This is not an array formula and CSE is not required.

  22. Girish says:

    Duplicates with multiple criteria

    The formula to use in versions before 2007 is:
    =IF(MIN(COUNTIF($B$7:$B$19,B7),COUNTIF($C$7:$C$19,C7))>1,”Duplicate”,””) has some limitations. It does not work under some conditions.

    We can instead use the following array formula:

    Let’s assume that we have data on two columns ie. B and C. The data may repeat on the same column but should not on both columns. Our data range for this example is “B7:C19?

    =IF(COUNTA($B$7:$B$19)-COUNT(IF($B$7:$B$19&$C$7:$C$19=B7&C7,B7&C7,0))>1,”Duplicate”,””)

  23. Suheb says:

    Need Help with excel formula:

    i have 2 columns, one column states the job status(such as planned, Unplanned, EMERGENCY etc) and other column states the date.

    My question is, what would be the formal if i need to count Individual status for a particular months, so that i can tabulate it in below format..
    PLANNED UNPLANNED EXTRA HIRE CALL OUT EMERGENCY SHUTDOWN EMERGENCY
    January
    February
    March
    April
    May
    June
    July
    August
    September
    October
    November
    December

  24. Hui... says:

    @Suheb
    I assume your data has two named ranges: Date and Status
    I have a column January-December in Column: D3:D14
    I have the status’s listed in Row: E2:K2
    .
    If your dates are in a single year then you can use something like:
    =SUMPRODUCT((TEXT(Date,”mmmm”)=$D3)*(Status=E$2))
    .
    If you have multiple years of data, try the following
    =SUMPRODUCT((YEAR(Date)=Year)*(TEXT(Date,"mmmm")=$D3)*(Status=E$2))
    .
    Where I have a Named Formula or cell called “Year” with 2011 in it
    .
    You will have to retype all the ” characters if you copy/paste these formula

  25. TANVI says:

    I have a non-unique list of book names. Can I get the count of the uniqie book names with an ecel formulae?

  26. andy holaday says:

    @TANVI

    Try this. If your list is a range called “Myrange”,
    [code]
    =SUM(N(FREQUENCY(MATCH(Myrange,Myrange,0),MATCH(Myrange,Myrange,0))>0))
    [/code]
    Note this is not an array formula.

  27. dennis says:

    I have a problem with vlookup and duplicates. if column A is product code and column B is quantity…column C to J is etc…. where product code can appear many times in the column with different quantities and etc…, how can I create a report on a separate sheet using vlookup or any other formula and get all the different rows with the same product code?

  28. [...] while ago (well more than 3 years ago), I wrote about an array formula based technique to check if a list of values have any duplicates in [...]

  29. Pam says:

    I have an excel spreadsheet of information where there are multiple columns.  There are instances of duplicate rows based on the First and Last name columns, the information in the rest of the columns may be different; there is a unique iD column, the highest ID= the newest record for that name.  I need to create a list of unduplicated records showing all the information from the list where the record pulled is the maximum Unique ID for the name.  Can anyone help me with the formula? 

  30. RON MALTASE says:

    I’m always looking for a formula that will find the duplicates in one column (like a customer number) and then look at a second column to find a unique value (like a specific part number), and then mark all lines of data for those customers, even the lines that don’t have the specific part number. That way I can see each line for that customer who bought the specific part, and still see the other parts they bought too.

Leave a Reply