• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Remove like data

Ww01

New Member
Hi all

I have a dataset where the first 8 columns are descriptive and the following 10 Columns are figures. What I want to do is that if all 8 columns are duplicated. All data columns must match exactly. Then I want the first instance and it's figures in another sheet. The data can be anywhere in the dataset and at the moment this is a manual task of sorting, filtering and deleting. Takes a while as thousands of rows.

Cheers

Wendy
 
Wendy

Without seeing the data
I would add a helper Column =COUNTIF(A2:H2,H2)=8
Copy down
Then sort by the helper column
That will put all the Values which have True together

Then do what you must with the data
 
Hi hui

Thanks for the post and your time. I got a false reading for all lines. I am the reason I am sure of it. Was there something else about the helper column, a second column?

Wendy
 
Wendy

I only guessed at what columns and rows you were using

Can you post a sample of data even 10 rows will be useful
 
Hi Wendy,

Withou a sample workbook to work on, I have done the following

Attached is a macro, that does the following

Insert a formula to Concatenate the Data in Col1 to Col8 into Column S
Match the data in Column S
Copy and Paste the data for the Matched cells to Sheet "Transfer"
Clear the Data.

You will note that all the references are Hard-coded in the Macro, but if this is what you need we can clean up the macro.

Note the data on the "Transfer" sheet must be deleted before you start the test.

Hope this helps

kanti
 

Attachments

  • TransDups.xlsm
    18.8 KB · Views: 3
Hui

Work don't let me on these sites. Chat????

So I use my iPad that has no excel. 8 cols text 10 figures. Thousands of rows.

Will have a look at the attached by Kchiba.

Wendy
 
Hi

OK so I put some dummy data into a workbook and put a procedure together which I think gets you over the line. It uses the scripting dictionary. Let me know if you need clarification. I have included a sample file.

Code:
    Dim a, b
    Dim i As Long
    Dim j As Long
    Dim txt As String
    Dim n As Long
    Dim k As Long, lc As Long
    lc = Range("IV1").End(xlToLeft).Column 'Trap your last col
    Sheet2.[a1].CurrentRegion.Offset(1).Clear
    With Cells(1).CurrentRegion 'assign range to var a
        a = .Value
        ReDim b(UBound(a), lc) 'Get holding var ready for a
   
        With CreateObject("Scripting.Dictionary")
            .CompareMode = 1
            For i = 2 To UBound(a, 1)
                For j = 1 To 8
                    txt = txt & Chr(2) & a(i, j)
                Next
                If Not .exists(txt) Then
                    .Item(txt) = Empty
                    n = n + 1
                    For k = 1 To lc 'loop through the unique items and add them to b
                        b(n, k) = a(i, k)
                    Next k
                End If
                txt = "" 'Clear the concatenation.
            Next
        End With
    End With
  Sheet2.Cells(2, 1).Resize(UBound(b), lc).Value = b

I tried to include some helping text within the code. File attached (2000 Rows) to prove workings.

Take care

Smallman
 
Firstly kbica

Thanks for looking at my problem. The procedure seemed to produce more rows than I expected. Probably something I did.

Small man the speed. It is a blink and over. How does it work? How did you do it.

W
 
The procedure is not as difficult to follow as you would think. It uses the same method as Kchiba but with the scripting dictionary. It just takes the size of the columns and rows assigns it to a variable, then assigns the first 8 columns to a text variable. It won't allow duplicates of that variable so once it hits the first instance that is it, it gets added to the variable b. Hope that helps.

Take care

Smallman
 
G'Day Mate,

How about more dope on the Scripting Dictionary?

would like to learn more about it.

thanks

kanti
 
Hi

Can the code be adjusted to cover just columns 1,2,3,5 10,18

I usually cut these after and that would save me more time than you have.

Thanks again
 
Hi Wendy

Sure thing. All you have to do is add a new variable for the columns you want to shift. This will effectively only shift the columns you include in the variable. This Variable;
Code:
ar = [{1,2,3,5,10,18}]

is in line with the Columns you requested. If you want to change the columns in question you do it here. Or set up a List sheet for all these variables.

Here is the updated code.

Code:
Sub MoveMe2()
    Dim a, b
    Dim i As Long
    Dim j As Long
    Dim txt As String
    Dim n As Long
    Dim k As Long
    Dim lc As Long
    Dim ar As Variant
   
    ar = [{1,2,3,5,10,18}]
    lc = Range("IV1").End(xlToLeft).Column 'Trap your last col
    Sheet3.[a1].CurrentRegion.Offset(1).Clear
    With Cells(1).CurrentRegion 'assign range to var a
        a = .Value
        ReDim b(UBound(a), lc) 'Get holding var ready for a
       
        With CreateObject("Scripting.Dictionary")
            .CompareMode = 1
            For i = 2 To UBound(a, 1)
                For j = 1 To 8
                    txt = txt & Chr(2) & a(i, j)
                Next
                If Not .exists(txt) Then
                    .Item(txt) = Empty
                    n = n + 1
                    For k = 1 To UBound(ar) 'loop through the unique items and add them to b
                        b(n, k) = a(i, ar(k))
                    Next k
                End If
                txt = "" 'Clear the concatenation.
            Next
        End With
    End With
  Sheet3.Cells(2, 1).Resize(UBound(b), lc).Value = b
End Sub

The attached file has both procedures in it. It will return all of the data to the Rtn tab and the summarised data to NewSummary tab.

Kanti

As a started iterate through my code and watch what is happening. As I said earlier it does the same thing as you were doing it just does it a whole lot faster and it is very powerful. Here is a starter;

http://msdn.microsoft.com/en-us/library/x4k5wbx4(v=vs.84).aspx

Look at the items on the left hand side.

Here is a MS knowledge base article to go with it.

http://support.microsoft.com/kb/187234

If reading theory is not your thing just email back on Chandoo I will answer any questions happily. :)

In the mean time you have a couple of similar examples.

Take care

Smallman
 

Attachments

  • MoveUnique2.xlsm
    185.3 KB · Views: 5
Hi Guys,

Thanks for the links, it seems that I have some homework to do.

Will probably start a new thread if I need more information.

Cheers

Kanti
 
Another point : you can have help directly in the VBA environment by entering Dictionary
into the search top right text case … (of course if all helps files well installed)​
 
Hi Smallman,

I am not sure if your code is doing exactly what was required, if I read it incorrectly, my apologies to you.

According to Wendy:

"What I want to do is that if all 8 columns are duplicated. All data columns must match exactly. Then I want the first instance and it's figures in another sheet."

So we only transfer out the data were the duplicated criteria is met, what I tested was inserting a row of non-sense data in your data-set, such that it was unique and was not duplicated in the data-set. This row appeared on the target sheets and by definition it should not.

Whereas according to the requirement "if all 8 columns are duplicated" this added row should not appear on the other sheets as it has not been duplicated.

However, it may just be that what you did was exactly what was eventually required.

But that begs the question of how you would only pick the first of the non-unique rows?

Cheers

kanti
 
Hi Kanti

It was my understanding that the requirement was to produce Unique data based on 8 columns. When you hit an instance where all 8 items are matching this is instance one. This is independent of all the other lines in the dataset. That unique item must be added to the list. In other words match all 8 Columns, add a flag, then NOTHING else gets added with the same flag.

This is exactly and precisely what the code does. Believe me I put unique data in the file. If you open the last file at the bottom you will see a pink cell in the very last row. Now bearing in mind that I can’t open the file and add that new information, that Pink cell is representative of a Unique dataset. It was my way of testing my code and this unique pink cell/row appeared in the summary page. That is the way I read the post. After reading your post a second time I still think that way.

Hope that helps.

Smallman
 
To Answer your second question;

But that begs the question of how you would only pick the first of the non-unique rows?

the guts of the procedure is happening here.


Code:
                If Not .exists(txt) Then
 
                    .Item(txt) = Empty
 
                    n = n + 1
 
                    For k = 1 To lc 'loop through the unique items and add them to b
 
                        b(n, k) = a(i, k)
 
                    Next k
 
                End If

So if it is the first instance It basically says if Txt does Not Exist in the Scripting Dictionary then the corresponding item in the variant 'a' is unique in which case add that to the variable 'b'.

On one of the other forums I post to there is a guy who answers posts almost exclusively with the dictionary. It is starting to gain a lot more traction as a result with more and more regulars on that site using it.

Take Care

Smallman
 
Hi Smallman,

Thanks for the response, I understand how and why your code works, my concern is that only data that have duplicates should be copied across and not a row of unique data as per:

"What I want to do is that if all 8 columns are duplicated. All data columns must match exactly. Then I want the first instance and it's figures in another sheet."

So my interpretation is that if a Row has no Duplicate per the definition of duplicate, it should NOT be copied across.

I maybe wrong, and your interpretation and treatment could be correct, my solution was built on the premise that only the first row of those rows with Duplicated will get copied across.

However, if Wendy is OK, there is no need other than curiosity to pursue this any further.

Cheers and thanks

Kanti
 
Smallman,

You are from the Western Suburbs of Brissie, maybe we can catch-up when I am back home in Brissie, I am from Jindalee
 
Back
Top