• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Sumproduct to find duplicates across all columns?

Raesu

Member
Hello


I have been trying to make a SUMPRODUCT formula to check if one line of data is exactly the same as another in my list. I have tried many variations, read thru the forensics post, but this formula is still giving me some difficulty to fully wrap my head around :(


Here is one of my attempts where I think i'm getting close (data is in rows 7,8,9 for this example):


=SUMPRODUCT(--(C7:C9=C7)*--(D7:D9=D7)*--(E7:E9=E7)*--(F7:F9=F7)*--(G7:G9=G7)*--(H7:H9=H7))


I expect this to return a 0 if any of the columns don't match, and a 1 if the line is essentially duplicated.


Any help appreciated, thanks.
 
You need to use either the * symbol, or double minus, not both.

=SUMPRODUCT(--(C7:C9=C7),--(D7:D9=D7),--(E7:E9=E7),--(F7:F9=F7),--(G7:G9=G7),--(H7:H9=H7))


If you're dealing with a large amount of data, another approach that's not quite so calculation intensive is to use a helper column (I'll assume AA) with this formula:

=C7&D7&E7&F7&G7&H7


and then another helper column to check for duplicated with this simple formula:

=COUNTIF(AA:AA,AA7)
 
Good suggestion.


Some of the countif formulas evaluate to "#VALUE" citing a value used is the wrong data type. Any ideas? Most of the formulas have 1, or 2 if the line is duplicated. There are a couple date columns but I don't know why sometimes the data type is off...


I selected the whole table and made it all "General" format, no change.
 
Not sure what's causing the #VALUE error. Is there somethingi different/odd about the values being concatenated on those lines?

To fixe the formula so that the format error doesn't happen, and to limit the results to just 2 values, we could do:

=COUNTIF(AA:AA,AA7)>1

Now, everything that says true has a duplicate value. If you only want to find the duplicate, and ignore the first instance, formula becomes:

=COUNTIF(AA$7:AA7,AA7)>1
 
I figured out that the formula does not like semi-colons (;) in any of the fields.


I like the 2nd formula, thanks. Now to figure out what to do with all these records with semicolons.


If I remove duplicates using excel functionality, and select all columns as possibly containing dupes, will that do the same process I am doing now? I wanted to investigate a bit at which ones are duped, but using excel functionality would get around this #VALUE error.
 
I've not used that feature of XL myself, so I'm afraid I don't know the answer. If it's just the semicolons causing problems, could change the formula in AA to be:

=SUBSTITUTE(C7&D7&E7&F7&G7&H7,";","")


which remove all the semicolons.
 
It doesn't seem to work, still give me a value error. I only get a true/false if I delete the cell containing X;Y;Z etc...
 
Hmm, I'm afriad I'm not able to duplicate your error. I was able to use several cells with semicolons in them

[pre]
Code:
x;y;z	a	x;y;za	10  <---No error
[/pre]
 
Hi, Raesu!

After reading the problem with the semicolons I thought it might be a problem with a field separator used in the regional configuration. As I'm running an Excel 2010 spanish version where the semicolon is the default list separator, I tried Luke M's formula and worked OK. Then I replaced the semicolon by comma (english list separator) and OK too. Finally I changed commas by dots, and it worked fine too.

The cell values from C to H are this (without the minus signs):

a;b --- a --- ;b --- b; --- ; --- ;;

Regards!
 
It looks like my issue was that there was too much text in those cells...after deleting portions of the cells I found where 1 additional character, in any cell of that row, will cause the formula to return #VALUE instead of true/false. Thanks for the help!
 
Hi, Raesu!

Can you upload a sample file with the original cells with its full extension? Thanks. And please tell us what Excel version are you using.

Regards!
 
Back
Top