• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Removing Partial Duplicates - One column

Shir

New Member
Hi There!

I have a list of vendor names in a column and I will like to delete duplicates. I need a way for Excel to identify partial duplicates for example:
ABC Company
ABC Company Inc
ABC Company.
ABC
ABC Company Corp.

I need Excel to tell me that these are potentially the same vendor.


Since I only have one column and we are talking about one data set I cant use Fuzzy Lookup. Please let me know if you have a solution for this!

THANK YOU :)
 

vletm

Excel Ninja
Shir
Are those which should leave or delete or mixed?
If there would also be Shop ABC or A B C or abc department or RABC INC - what to do?
 

Shir

New Member
Shir
Are those which should leave or delete or mixed?
If there would also be Shop ABC or A B C or abc department or RABC INC - what to do?
I was hoping Excel can just identify and I can decide what to delete? Like I see your point RABC INC might be a different vendor so I might not want to have Excel delete everything.
 

Shir

New Member
I agree with Vletm
But here is something that might help
=if(lower(cell-with-text)<>SUBSTITUTE(lower(cell-with-text),lower(word-to-check),"",2),"Yes","No")
https://chandoo.org/wp/excel-formula-help-finding-repeated-words-cell/
Hey, not sure this formula is what I am looking for. The issue is I don't know what words are repeating. So I am looking for a way Excel can take a look at a whole column of data and tell me ...for example.... 5 line items could potentially be duplicates and I can then go and delete the ones I don't want.
 

vletm

Excel Ninja
Shir
Your: I was hoping Excel can just identify and I can decide what to delete?
Hmm? hmm?
Excel can do those, if someone would 'teach' what to do! ... do You know term 'rule'?
 

Harry0

Member
A formula can help to a degree but you need to double check it. You can break it down and take it form their which the code above might help but other examples are these.

If your list is on A2 and after then put this on the the next column
B This will just put the first word in =LEFT(A2,FIND(" ",A2)) note it wont find single worlds since excel kind of sucks in doing things simple which needs a more technical formula as usual. Or use =iferror(=LEFT(A2,FIND(" ",A2)),A2) and copy past the value since it can not read single wold value which would give an error.
C will count how many are duplicates of B of 100 names =COUNTIF(B$2:B2,B2)
and maybe copy paste value the results and erase with caution

excel is great but the people that created it in my opinion are no so great which is why so many companies have their own bad variations and updates all the time. one has to be creative to make it work the way you want it to.
 

GraH - Guido

Well-Known Member
You might want to look for "Damerau-Levenshtein distance algorithm". Some free macros are available on the web.
Basically those functions can give you:
  1. the number of step-mutations required to convert a string A to a string B
  2. the match % between both strings
This would require you make a cartesian product like table to enable matching all values of column A with all other values. Then fill the grid with the functions.

Some Python code also seems available. I believe there are add-ons available to extend Excel with Python power. I think it was Chihiro who mentioned it in the vault a while ago.
 

navic

Active Member
Hi @Shir
Perhaps
If we consider that the data in the range A1:A100.
The condition is part of the company's name.
Using the Conditional Formatting and the formula below, we will highlight each cell containing the condition. The formula is case sensitive.
Code:
=ISNUMBER(FIND("ABC",A1))=TRUE
or, this formula below, is not case sensitive.
Code:
=ISNUMBER(SEARCH("ABC",A1))=TRUE
Of course, the condition can be set up in a cell and quickly change it.
Once there it occurred that the cells are highlighted, you can decide what you will remove.
Also, if you do not want to use Conditional Formatting, then in the cell 'B1' set formula and copy down to the last row.
Later, filter by "TRUE" and remove what you want.
 
Top