• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Duplicate in Cells

Adnan Halai

New Member
Hello Experts,

I am a beginner in Excel, working on various projects. Recently I am working on Standardization of material data we have.

Is there any way we can locate duplicate shown in Attached File.

Regards
A
 

Attachments

  • 4Expert.xlsx
    8.8 KB · Views: 4
... and do You always compare just two pair of cells as in file?
... or ...
You could/would create new file
with more samples
which would have more realistic situation.
 
I have around thousands of cells where I need to locate two code which are identical..however there are many material which has minor changes as in "s" in description, Lb and Lbs, 25 kg and 25 X 1 Kg etc. Everything is same except such characters...

With vlookup, two cell value has to be same but with extra S or 1, I am not able to find duplicate.
 
As @vletm suggested more examples with details would help.

If you're trying to identify descriptors that are *approximately* alike, for example to catch badly input data, try sorting them, but it's not a quick fix; I'm not sure there is one.

How many total records do you have? What percentage do you thing is bad? That will give you an idea of the challenge.
 
Attached is the Sorted Data..I have 7899 Such Row. I have added few and not sure how many such cases are there...thats why I am doing exercise.

I have remove Space, Hifen, Trim my data so that I can able to compare. Attached is one more example file.
 

Attachments

  • C.xlsx
    16.2 KB · Views: 3
This had me scratching my head and searching for a solution - nothing too glamorous.
I created a Table of your data, did an =LEN(of each cell in material description) and then compared the adjacent cells - anything within =+/- 2 I flagged with conditional formatting. My second shot was to use =SUBSTITUTE(Contents of Cell,Look in next row of that column for a match, and replace it with "" nothing)
Again low counts of the answer are a clue ....

The former method catches those cells that may have slight differences anywhere within the text, although it has lots of false positives. The latter finds those adjacent cells with extra characters on the end of strings.

Filtering the table columns by color will help you narrow down your candidates for inspection.

There are examples of possible VBA solutions out there in Google land, but they tend to be along the lines of step through the values in Col A looking for something similar in Col B, which you could certainly make work - howeveer they rely on some pretty arcane concepts (well they are for me) in the area of string diffing. Depends how automated you need to make this process - if its a one time deal, I don't think it would be worth the effort.

If you want to satisfy yourself of the complexity of a VBA approach you can take a look http://dailydoseofexcel.com/archives/2012/09/13/string-diffing/
 

Attachments

  • C - DME.xlsx
    24 KB · Views: 1
Hi ,

I would advise against doing anything with this data other than examine it painstakingly.

Which of us can say whether the following two items of data are the same or different ?

30%DEZinTolueneinB2

30%DEZinTolueneinB5

The difference is just one character , but for all we know , there can be a significant difference between them.

If the data had been normal English text , it would have been another story , but with technical text , especially names of chemicals / medicines , it would be difficult to apply an automated solution which eliminates the need for human scrutiny.

Narayan
 
Back
Top