• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Identifying duplicates using data from multiple columns

mwmo_user

New Member
I'm working with a small organization to create a central contact database that consolidates about 15 staff members' lists of contacts. I've got all the data in one excel file (much of it exported from Outlook, but a lot of it has also been maintained manually in excel over long periods of time). Ultimately, once the list is cleaned up, I will import it back into Outlook, but it's going to require a lot of editing and consolidating.


So, here's my question: Is there a way to IDENTIFY duplicates from multiple columns?


Here's an example:


Column A / Column B / Column C

1. Smith / Daniel / 123 Main St. /

2. Smith / Daniel / 123 Main St. /

3. Smith / Daniel / 456 South St. /

4. Smith / Pamela / 987 West St. /

5. Thomson / Daniel / 576 Pacific St.


So, is there a way for Excel to identify that Columns A and B match in Rows 1, 2 and 3? It does me no good to have duplicates identified only by column A (because Pamela Smith is not the same as Daniel Smith) or Column B (Daniel Smith and Daniel Thomson are not the same). Also, I don't want Excel automatically deleting duplicates because I will need to make individual judgments about whether Daniel Smith's address on Main Street or South street is the one to keep. The only TRUE duplicates in this list are rows 1 and 2, and it would be fine if those were eliminated, but perhaps that is too much to ask!


Any tips? I'm hoping there's some kind of conditional formatting formula I can use that would highlight duplicate names, but all that's a bit over my head and I'd love some help. I hope this hasn't been too confusing...I'm happy to clarify any questions.


Thanks!
 
Simplest trick is to concatenate the columns of interest, then compare.


E.g.

In col D:

=A2&B2&C2


In col E:

=COUNTIF(D$2:D2,D2)


Copy both formulas down as needed. Anywhere you have a value > 1 in col E, you have a duplicate and can probably delete that row. Adjust which columns you want to include to concatenate as desired (so, maybe all 3 to get the easy corrections, then only A & B to sort out the manual comparisons?)
 
Hi Luke M,


Having a very similar problem as above mentioned by mwmo_user. Only that I need the result to be customized as per the nos of criterion being duplicate.


A example of what I am doing is attached. Basically we are setting up a labour management system to make insurances for them. We dont want to make two insurances for the same person. So a centralized database is being created , but the file should warn when we are entering a duplicate.


The problem is people have similar names. SO we need a small warning when name matches, a bigger warning when both first name and last name matches, a alert when names and age also matches, and call it a duplicate when names, age and blood group matches.


Your solution works beautifully when all condition are matching, and customising your solution to my problem would be a lot of if then formulas, which I think would be unnecessary. Any help would be appreciated.


File: https://www.dropbox.com/s/he88rv5p0wpbhwn/labour%20data.xls


Thanks in advance


Prashanth
 
HI, pr4peace!


If you haven't done yet, I'd suggest to read the three green sticky posts at this forums main page, specially the 3rd one, in its 14th paragraph:

"Start a new post every time you ask a question, even if the theme is similar. The original author may continue asking questions whilst the post is in progress and of course if you are answering questions you may need to ask questions of the initial poster."


Regards!
 
Back
Top