• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to remove duplicates, but not all?

milansk

New Member
Hi, is it possible in Excel to remove the duplicated, but not all?

For example, I have the following companies in column A:

Nike
Apple
Coca Cola
Nike
Nike
Nike
Nike
Apple

So, I want the duplicates to be removed, but not all. I want 3 duplicates to stay per company.

So, there are 5 "Nike" companies and I want 2 of them randomly to be removed and 3 to stay.
Also, there are 2 "Apple" companies. In this case, I want both "Apple" companies to stay because I am removing records with above 3 duplicates.
 
Hi, just use a COUNTIF formula in a helper column then filter this column for >3 then delete the filtered rows …​
 
Sorry, I tried to implement this solution but I couldn't because I am a beginner in Excel.
Is it possible to implement this in the attached list?

Please note that I would like to do this with even larger lists of hundreds of contacts.
 

Attachments

  • list with duplicates.csv
    13.9 KB · Views: 9
Power Query can do this for you. In the attached, there's your table with max 3 duplicates in column C. Note that the kept rows are the higher ones in the list. It grabs the data directly from your csv file, but it needs to know where that file is on your system and that information is in cell A1 of the sheet, which is highlighted in yellow and is a named range called FilePath. All you need to ensure is that that filepath and filenmame is correct by editing that cell directly. Then you can refresh the table by right-clicking somewhere in it and choosing Refresh. It will get the data from the file the remove the duplicates.
 

Attachments

  • Chandoo52819.xlsx
    24.3 KB · Views: 8
Sorry, I tried to implement this solution but I couldn't because I am a beginner in Excel.
Is it possible to implement this in the attached list?

Please note that I would like to do this with even larger lists of hundreds of contacts.
Why didn't you say so from the beginning? This might be a free forum, but does not mean I have to waste my time unnecessarily... :mad:
 
Power Query can do this for you. In the attached, there's your table with max 3 duplicates in column C. Note that the kept rows are the higher ones in the list. It grabs the data directly from your csv file, but it needs to know where that file is on your system and that information is in cell A1 of the sheet, which is highlighted in yellow and is a named range called FilePath. All you need to ensure is that that filepath and filenmame is correct by editing that cell directly. Then you can refresh the table by right-clicking somewhere in it and choosing Refresh. It will get the data from the file the remove the duplicates.

I followed your instructions, but it is NOT working for me.

I recorded what I did. You can download the video from HERE

I am getting an error when I am opening your file. It says:
Removed Part: /xl/queryTables/queryTable1.xml part. (External data range)
 
I am getting an error when I am opening your file. It says:
Removed Part: /xl/queryTables/queryTable1.xml part. (External data range)
Me too with that file; it must have got corrupted. Try again with the attached.
 

Attachments

  • Chandoo52819.xlsx
    24 KB · Views: 8
Could you let me know how to increase to 4 duplicates or decrease to 2 duplicates instead of the current setting of 3 duplicates?
In the 5th step of the query (Grouped Rows) change the 3 to a 4:
= Table.Group(#"Added Index", {"Owner"}, {{"grp", each Table.FirstN(_,3), type table}})
becomes:
= Table.Group(#"Added Index", {"Owner"}, {{"grp", each Table.FirstN(_,4), type table}})
 
…or with a tweak to the query, in the attached, put the number of duplicates in cell B1 and refresh the query.
 

Attachments

  • Chandoo52819b.xlsx
    23.8 KB · Views: 2
May I know how we can link the data source to the path in one of the cell in current workbook as shown in the first step "FilePath" below?
As I said:
and that information is in cell A1 of the sheet, which is highlighted in yellow and is a named range called FilePath.
You can create a single-celled named range by selecting that cell, then writing the name in the name box (where you normally see the address of the cell you're at) It will only do that if the name doesn't already exist:

84034

result in Name Manager (where you can delete a name if you want to move it):

84035
 
Hi @p45cal

I know how to create name under the name manager. What I wish to learn is how to add in the step of 'FilePath' in the power query?

I try to explore it from Get Data but I cannot find any option which allow me to link the data source the the FilePath created under the Name Manger.
 
how to add in the step
After clicking the fx symbol in the formula bar, you type it in. There is no user-interface icon/button/dropdown selection to do this.
You can also add the whole line in Advanced editor.
 
Last edited:
Back
Top