1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Can't shake off duplicates efficiently using macro

Discussion in 'VBA Macros' started by shahin, Aug 12, 2017.

  1. shahin

    shahin Active Member

    Messages:
    479
    I've written a macro to remove duplicate values by deleting the entire row. The macro is supposed to compare the value of Range A1 to any of the Range underneath within the same column to be sure that they are not the same. If it is then it will delete the entire row in which duplicate value is found. My macro can do that but I need to run the macro twice to get the whole thing done because after first run i can still see some duplicates. However, when i run it twice or thrice, it can shake off all duplicate values. One more thing: it takes almost 3 minutes to shake off 2500 duplicates out of 6000. That means it works slowly as well. What change should i make in my macro to do the whole stuff in a single run and within fewer times possible?

    Here is what I was trying with:

    Code (vb):

        Sub Dup_removal()
            Application.ScreenUpdating = False
            Dim i As Long
            Dim j As Long
            i = 2
            Do Until Cells(i, 1).Value = ""
                j = i + 1
                Do Until Cells(j, 1).Value = ""
                    If Cells(i, 1).Value = Cells(j, 1).Value Then
                        Cells(j, 1).Select
                        Selection.EntireRow.Delete
                    End If
                    j = j + 1
                Loop
                i = i + 1
            Loop
            Application.ScreenUpdating = True
        End Sub
     
  2. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,598
    Hi ,

    The fastest way to weed out duplicates would be to use a dictionary to store the keys , in this case the values in column A.

    For each value , use the Exists method to check whether the key already exists in the dictionary ; if it does , then delete that particular row and move on to the next.

    In deleting rows , start from the last row of data and work your way upwards towards the first row of data.

    Another option is to first sort the entire data on the key column ; this way all duplicates will be clustered together , and you can do a wholesale delete for each set of duplicates.

    Narayan
    Last edited: Aug 12, 2017
    shahin likes this.
  3. Hui

    Hui Excel Ninja Staff Member

    Messages:
    10,686
    It is slow because you are working directly off the Worksheet Cells

    Load the whole area into an array in VBA

    Then check for duplicates as Narayan mentioned using a dictionary

    Super fast
    shahin likes this.
  4. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,598
    Hi ,

    Try this :
    Code (vb):

    Public Sub RemoveDuplicates()
              Debug.Print Time
              Dim dict As Scripting.Dictionary
              Dim cell As Range
              Dim currval As Variant
           
              Set dict = New Scripting.Dictionary
              lastrow = Me.Range("A" & Rows.Count).End(xlUp).Row
              j = 1
           
              For i = lastrow To 2 Step -1
                  With Me.Cells(i, 1)
                        currval = .Value
               
                        If dict.Exists(currval) Then
                          .EntireRow.Delete
                        Else
                          dict.Add currval, j
                          j = j + 1
                        End If
                    End With
              Next
              Debug.Print Time
    End Sub
     
    Narayan
  5. shahin

    shahin Active Member

    Messages:
    479
    You are awesome, Narayan. It's a great way to accomplish this type of thing. Thanks a lot.
  6. shahin

    shahin Active Member

    Messages:
    479
    @Narayan, It took 3 seconds only to do the whole operation. Btw, why "Me" keyword was putting a barrier? I ran the macro after taking it out, though! Forgive my ignorance.
    Last edited: Aug 12, 2017
  7. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,598
    Hi ,

    You might have copied the code into a Module ; the Me keyword works in the Worksheet section , where it refers to the worksheet which has the code ; in a Module , it will generate an error.

    Narayan
  8. shahin

    shahin Active Member

    Messages:
    479
    What if there are several duplicates in several columns and I would like to delete them all. The other day, the below code Narayan provided me to remove duplicates which is indeed a damn efficient one. However, when i try to give it a little twitch to satisfy my need, it neither works nor throws any error. Definitely I'm doing something wrong here but unable to figure it out.
    Code (vb):

    Sub RemoveDuplicates()
          Dim dict As Scripting.Dictionary
          Dim cell As Range
          Dim currval As Variant
     
          Set dict = New Scripting.Dictionary
          lastrow = Range("A" & Rows.Count).End(xlToRight).End(xlUp).Row
          j = 1
          For i = lastrow To 2 Step -1
              With Cells(i, 1) Or Cells(i, 2) Or Cells(i, 3)
                    currval = .Value
         
                    If dict.Exists(currval) Then
                      .EntireRow.Delete
                    Else
                      dict.Add currval, j
                      j = j + 1
                    End If
                End With
          Next
    End Sub
     
    Last edited: Sep 8, 2017
  9. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,598
    Hi ,

    This is where the Debug mode and the Immediate window come in handy.

    In the Immediate window , type in the following and see what is displayed :

    ?Range("A" & Rows.Count).Address

    ?Range("A" & Rows.Count).End(xlToRight).Address

    ?Range("A" & Rows.Count).End(xlToRight).End(xlUp).Address

    You will be able to recognize the problem when you see the value of lastrow , and how it will impact the For ... Next loop.

    Narayan
    shahin likes this.
  10. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,598
    Hi ,

    However , this will also not work :

    With Cells(i, 1) Or Cells(i, 2) Or Cells(i, 3)

    You will have to rewrite the code as follows :
    Code (vb):

    Sub RemoveDuplicates()
          Dim dict As Scripting.Dictionary
          Dim cell As Range
          Dim currval As Variant
          Dim found As Long

          Application.ScreenUpdating = False
       
          Set dict = New Scripting.Dictionary
          lastrow = Range("A" & Rows.Count).End(xlUp).Row
          j = 1
          For i = lastrow To 1 Step -1
              found = 0
              currval1 = Cells(i, 1).Value

              If dict.Exists(currval1) Then
                found = 1
              End If
           
              currval2 = Cells(i, 2).Value
              If dict.Exists(currval2) Then
                found = found + 2
              End If
                 
              currval3 = Cells(i, 3).Value
              If dict.Exists(currval3) Then
                found = found + 4
              End If
             
              If found Mod 2 = 0 Then
                dict.Add currval1, j
                j = j + 1
              End If
             
              If (found mod 4) <= 1 Then
                dict.Add currval2, j
                j = j + 1
              End If
             
              If found < 4 Then
                dict.Add currval3, j
                j = j + 1
              End If
           
    '        The following check deletes the row if all three cells have duplicates
    '        If you wish to delete a row if even one of the cells has a duplicate then
    '        check for :
    '        If found <> 0 Then

              If found = 7 Then
                Cells(i, 1).EntireRow.Delete
              End If
          Next
       
          Application.ScreenUpdating = True
    End Sub
     
    Narayan
    Last edited: Sep 9, 2017
    Thomas Kuriakose and shahin like this.
  11. shahin

    shahin Active Member

    Messages:
    479
    Thanks a zillion, Narayan. You have always been a great help.
  12. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,598
    Hi ,

    I just noticed that the logic for adding items to the dictionary is wrong.

    The second check should be :

    If (found mod 4) <= 1 Then

    instead of :

    If (found mod 4) = 0 Then


    Narayan
    Last edited: Sep 9, 2017

Share This Page