• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to remove duplicate from a integer,integer+txt values

mrzoogle

Member
Hi Excel Guru,


Today I was trying to remove duplicates for analysis purpose and I came across a situation where I couldn't remove any duplicate using built-in duplicate remove.

I believe this issue is caused by inclusion of integer and integer+txt data in the list, excel is getting confused ?

I have attached the list for your reference. Please have a look at it and if you could point me how to overcome this issue that would be great.

http://www.mediafire.com/view/?cc36ekd8y2l7e0g


Regards,

Z.
 
Hi ,


Strange. I faced the same problem.


But when I sorted the list , and then removed duplicates , all the duplicates were removed !


Narayan
 
Z


Interesting, weird


If you use Excel 2010, Data, Remove Duplicates it says it removes 12 records and leaves 483 remaining


If I use Advanced Filter it leaves 421 Unique records


Sorry I don't have time to look further tonight
 
@Narayank991 how strange it that. I tried sorting and then removing it and thus it removed all the duplicates.

@Hui Advanced Filter also leaves it with Unique records. Not a problem Hui if you don't have time today. Any time will be fine.


Thank you @Narayank991 & Hui for looking into this. If you could find a solution on why it won't remove duplicates without modifying its original structure please let me know.


Regards,

Z.
 
This is very interesting, probably you have discovered a bug in Excel. While it will be very difficult to assess the way Excel 'internally' handles remove duplicates, it can be guessed [so accuracy of guess will depend on the solidity of observations]. I should confess that I have not checked a lot.


Following is what seems to happen.


1. Excel loads up data into some collection object [to get rid of duplicates].


2. As soon as it comes across change in data type it creates a collection specifically for that data type. In your case, data types could be number [integer or long] and string.


3. If it just went by those two rules then it would still be able to remove all duplicates as it will have as many collections as the data types.


4. The glitch is that Excel "seems" to close a collection if does not encounter the data type for some "arbitrary" amount of data and probably, creates new collection when it again encounters the data.


5. Finally, collections are joined together. Previous data erased and final data is posted.


I guess, this explains why NarayananK's sorting method works as it brings data of similar type together.


point 1 to 5 are complete work of [fiction which I am good at and] observations[well, do I need to say in that regard]!!!


Kinda ready for brickbats:)


Edit: I have reference-posted this topic on MrExcel as well as I am not sure all guys from there visit Chandoo.

http://www.mrexcel.com/forum/showthread.php?644732-lt-Excel-2007-gt-Remove-Duplicates-functionality-seems-to-have-a-bug!
 
strange...

after remove duplicate again tried to remove again, it again says 11 duplicate found 484 unique value.. and again.. and again.. every time 11 duplicate found.. 484 unique value.. HOW? .. infact, if you uncheck 'My data has header' it says 19 duplicate found, 477 unique records..


after lots of R&D Final Conclusion..

Advance Filter or Pivot Table working on LinkedList but RemoveDuplicate is working in BSearch method. If we first sort the list and then remove duplicate it works perfect. (421 unique record - same as Advance Filter or Pivot Table) that means, it first pick the 1st array then search it with the remaing bottom arrays, then pick 2nd array with a assumption that top ones are already checked. Just like Binary search..


dont hesitate to shout on me if I am wrong.. :)


If there was any possibility to set 'Option Compare Binary' as default then it may works..
 
Hi, mrzoogle!


Experienced same issue. Actually strange. Googled and nothing found, so searched in MS issues by application and only found this:

http://office.microsoft.com/en-us/excel-help/filter-for-unique-values-or-remove-duplicate-values-HP010342518.aspx?CTT=1


It refers to removing duplicates when outlined or subtotals, for Redmond guys it seems to don't exist another problem. They surely are in the cloud(s) :)


Regards!
 
Thank you everyone for helping me understand this issue and offering solutions.

Definitely looks like an bug/glitch in excel.

Regards,

Z.
 
Back
Top