• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Duplication Check For Specific Unique Data Types

Hi
I am totally stuck.
I have a file with 10000s of rows and I need to check for duplicates between employee and vendors. Please note I need to exclude employee to employee duplicates.
I only need to check employee to the supplier duplicates , and I do not know how to do exclude employee to employee duplicates from the analysis - so getting 1000s of false positives.

Let me give you a sample file and show the outcome required.

It is really key to be able to know which duplicates go together - so we need to be able when we filter to distinguish each group of duplicates.

Thank you

We need to run on the House field

Type Ref Number House Expected Result
Employee 1A 23 Positive -A
Employee 1B 23 Positive -A
Vendor 1C 23 Positive -A
Vendor 1C 23 Positive -A
Employee 1D 24
Employee 1E 23
Employee 1D 24
Employee 1D 24
Employee 1D 30
Employee 1D 54 Positive -B
Vendor 1D 40 Positive -C
Employee 1D 40 Positive -C
Employee 1D 30
Employee 1D 100
Employee 1D 656
Employee 1D 65
Employee 1D 656
Employee 1D 65
Supplier 1D 54 Positive -B
 

Attachments

  • Duplicates by Type.xlsx
    9.6 KB · Views: 11
Looking at your sample. It isn't clear why Row 2 to 5 are all identified as Positive-A, but not Row 7.

Can you clearly demonstrate/explain logic used?
 
Hi

My mistake - row 7 should be - good catch :). This means that we are on the same page i hope.

The logic

1)
if we have 'Type' from column A which is employee only and we have duplicates in column 'C' - this should be ignored.

rows 8 and 9 in excel are duplicates but only employees so this should be ignored.


2) if we have 'Type' from column A which is employee and Vendor and we have duplicates in column 'C' - this should be highlighted as an a potential issue.

Therefore in excel:
rows 2 to 5 - have duplicates in column for employee and vendor - these should be highlighted

row 11 and row 20 are duplicates and these should be shown.

row 12 and 13 are duplicates and this should be shown.

upload_2018-5-15_20-9-23.png
 

Attachments

  • upload_2018-5-15_20-8-7.png
    upload_2018-5-15_20-8-7.png
    34.4 KB · Views: 4
Hmm, not sure about marking A, B, C etc (without helper column).

But can you use this?
=IF(COUNTIFS($C$2:$C$20,C2,$A$2:$A$20,"<>"&A2)>0,"Positive - "&C2,"")

upload_2018-5-15_13-24-30.png
 
I have no clue how it works and how it excludes same duplicate in employees - but it looks like it work - amazing
if you care to explain that will be great
 
COUNTIF is used to count any item in list that has same house but different type ("<>" & value). If count is greater than 0, then it means there is duplicate with different type. Then IF is used to return concatenated text "Positive -" & house value.
 
Hi
If i wanted to reduce false positives more
can we add another column to that will help me.
I should have asked in the beginning - did not want to overwhelm


upload_2018-5-15_21-31-38.png
 

Attachments

  • Duplicates by Type - 2 columns.xlsx
    9.8 KB · Views: 11
Back
Top