• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Copying data that doesn't match into a seperate tab

muna

Member
Hi all,

Please, could someone post a robust formula that will allow data that doesn't match to be copied into a separate tab?

Excel file explains.

Thank you in advance.
 

Attachments

  • example.xlsx
    11.5 KB · Views: 10
Code:
Option Explicit

Sub muna()
    Dim s1 As Worksheet, s2 As Worksheet
    Set s1 = Sheets("Data dump")
    Set s2 = Sheets("Results page")
    Dim i As Long, lr As Long, lrR As Long
    Application.ScreenUpdating = False
    lr = s1.Range("A" & Rows.Count).End(xlUp).Row
    For i = 2 To lr
        lrR = s2.Range("A" & Rows.Count).End(xlUp).Row
        With s1
            If .Range("A" & i) <> .Range("B" & i) Or .Range("C" & i) <> .Range("D" & i) Or .Range("E" & i) <> .Range("F" & i) Then
                .Range("A" & i & ":F" & i).Copy s2.Range("A" & lrR + 1)
            End If
        End With
    Next i
    Application.CutCopyMode = False
    Application.ScreenUpdating = True
    MsgBox "Complete"

End Sub

Standard Module
How to install your new code
Copy the Excel VBA code
Select the workbook in which you want to store the Excel VBA code
Press Alt+F11 to open the Visual Basic Editor
Choose Insert > Module
Edit > Paste the macro into the module that appeared
Close the VBEditor
Save your workbook (Excel 2007+ select a macro-enabled file format, like *.xlsm)

To run the Excel VBA code:
Press Alt-F8 to open the macro list
Select a macro in the list
Click the Run button
 
Code:
Option Explicit

Sub muna()
    Dim s1 As Worksheet, s2 As Worksheet
    Set s1 = Sheets("Data dump")
    Set s2 = Sheets("Results page")
    Dim i As Long, lr As Long, lrR As Long
    Application.ScreenUpdating = False
    lr = s1.Range("A" & Rows.Count).End(xlUp).Row
    For i = 2 To lr
        lrR = s2.Range("A" & Rows.Count).End(xlUp).Row
        With s1
            If .Range("A" & i) <> .Range("B" & i) Or .Range("C" & i) <> .Range("D" & i) Or .Range("E" & i) <> .Range("F" & i) Then
                .Range("A" & i & ":F" & i).Copy s2.Range("A" & lrR + 1)
            End If
        End With
    Next i
    Application.CutCopyMode = False
    Application.ScreenUpdating = True
    MsgBox "Complete"

End Sub

Standard Module
How to install your new code
Copy the Excel VBA code
Select the workbook in which you want to store the Excel VBA code
Press Alt+F11 to open the Visual Basic Editor
Choose Insert > Module
Edit > Paste the macro into the module that appeared
Close the VBEditor
Save your workbook (Excel 2007+ select a macro-enabled file format, like *.xlsm)

To run the Excel VBA code:
Press Alt-F8 to open the macro list
Select a macro in the list
Click the Run button

Thank you so much! Worked great.

Out of curiosity, do you think this is possible to do with formulas? No need to publish the formulas. I wouldn't want to unnecessarily take up your time :)

However, I would appreciate if you could publish a formula to calculate the percentage of cases which was classified correctly for each directorate.

Percentages would work out as:
Customer evaluation = 50% correctly classified
customer importance = 100% correctly classified

P.S. Should there be a colon next to F?

.Range("A" & i & ":F" & i)

What function does the colon do?
 
Maybe..........

In A2, copied across to F2 and all copied down :

=IFERROR(INDEX('Data dump'!A$2:A$5,AGGREGATE(15,6,ROW('Data dump'!A$2:A$5)-ROW('Data dump'!A$1)/(('Data dump'!$A$2:$A$5<>'Data dump'!$B$2:$B$5)+('Data dump'!$C$2:$C$5<>'Data dump'!$D$2:$D$5)+('Data dump'!$E$2:$E$5<>'Data dump'!$F$2:$F$5)),ROWS($1:1))),"")

Regards
Bosco
 

Attachments

  • MissMatchData.xlsx
    12.5 KB · Views: 3
P.S. Should there be a colon next to F?

.Range("A" & i & ":F" & i)

What function does the colon do?

The statement says the range A to F in row i. You need the colon in a manner similar to =Sum(A1:C1) which provides a sum of Cells A1, B1 and C1
 
The statement says the range A to F in row i. You need the colon in a manner similar to =Sum(A1:C1) which provides a sum of Cells A1, B1 and C1
Thank you and everyone else for their help.

Do you happen to know the answers to the other questions I asked you earlier i.e. percentage etc...

Or would I be better of creating a new thread?
 
This has a formula-based solution as well.

It may look strange but it works!

Code:
Name    Refers to
Correct    = (Table1[Original name]=Table1[Actual name]) * (Table1[Original team]=Table1[Final team]) * (Table1[Original directorate]=Table1[Final directorate])
Error_count    = SUM( 1 - Correct )
Error_record    = SMALL( IF( NOT(Correct), Record_number ), Record_number )
Record_number    = ROW(Table1) - 1

Worksheet formulas for the percentage error are of the form
Code:
= SUMPRODUCT( N(Table1[Original name]<>Table1[Actual name]) ) / ROWS(Table1)
 

Attachments

  • Antijoin (PB).xlsx
    22.9 KB · Views: 5
Thank you everyone!

This has a formula-based solution as well.

It may look strange but it works!

Code:
Name    Refers to
Correct    = (Table1[Original name]=Table1[Actual name]) * (Table1[Original team]=Table1[Final team]) * (Table1[Original directorate]=Table1[Final directorate])
Error_count    = SUM( 1 - Correct )
Error_record    = SMALL( IF( NOT(Correct), Record_number ), Record_number )
Record_number    = ROW(Table1) - 1

Worksheet formulas for the percentage error are of the form
Code:
= SUMPRODUCT( N(Table1[Original name]<>Table1[Actual name]) ) / ROWS(Table1)

Oh wow you did the percentages too!

I'm so grateful!

Thank you very much :)

Can you tell me how to get the crosses and tickets to automatically show for each new case added?

Can you also tell me how to make it so that the spreadsheet refreshes automatically rather than manually? I know you can use a macro button but I was hoping for a solution without macros.

Additional challenge:
Ignore this challenge if you don't have the time. Would appreciate it if you could still answer the below. I was wondering if the text could automatically change to say what error has occurred i.e. name and team misclassified rather than just name misclassified. It should say all the errors for a row of text?

Can you post a step-to-step guide for all the solutions you have posted within this thread? I know this may be time-consuming but it's the best way for me to learn. Would really appreciate it.
 
Last edited:
I apologize. The workbook was meant to show a formula alternative to the Power Query solution. Unfortunately I moved the data range at the last moment before posting and did not notice that my formula for 'record_number' was
= ROW(Table1) - 1
rather than the more robust
= ROW(Table1) - ROW(Table1[#Headers])

The ticks and crosses are simply conditionally formatted [0, 1] which shows the content of the named array 'correct' which refers to the formula
= (Table1[Original name]=Table1[Actual name])
* (Table1[Original team]=Table1[Final team])
* (Table1[Original directorate]=Table1[Final directorate])

As it stands such an array formula would need to be extended by selecting the larger region and reentering the formula using CSE.

To make the values extend with the table one would require a non-array formula that is part of the table, as shown below. Since the formula for compacting the error rows is an array formula, I have used the new column 'OK?'
=Table1[OK]
to rebuild the formulas for 'Error_count' and 'Error_record'
= SUM( 1 - OK? )
= SMALL( IF( NOT(OK?), Record_number ), Record_number )

If you have a sufficiently up-to-date version of Excel, I believe there are 'refresh' methods that will keep the Power Query solution in step with any changes you make to the data sheet.
 

Attachments

  • Antijoin alt (PB).xlsx
    23.2 KB · Views: 8
I apologize. The workbook was meant to show a formula alternative to the Power Query solution. Unfortunately I moved the data range at the last moment before posting and did not notice that my formula for 'record_number' was
= ROW(Table1) - 1
rather than the more robust
= ROW(Table1) - ROW(Table1[#Headers])

The ticks and crosses are simply conditionally formatted [0, 1] which shows the content of the named array 'correct' which refers to the formula
= (Table1[Original name]=Table1[Actual name])
* (Table1[Original team]=Table1[Final team])
* (Table1[Original directorate]=Table1[Final directorate])

As it stands such an array formula would need to be extended by selecting the larger region and reentering the formula using CSE.

To make the values extend with the table one would require a non-array formula that is part of the table, as shown below. Since the formula for compacting the error rows is an array formula, I have used the new column 'OK?'
=Table1[OK]
to rebuild the formulas for 'Error_count' and 'Error_record'
= SUM( 1 - OK? )
= SMALL( IF( NOT(OK?), Record_number ), Record_number )
Thank you.

Do you happen to have a revised file?
 
I hit 'Post Reply' prematurely. I have now edited the post to include the updated file.
Thank you.

Apologies for not asking earlier...

Do you know how to get percentages of misclassified cases for each team i.e. HR cases were misclassified X amount of percent.

P.S. Can you post a step-to-step guide for all the solutions you have posted within this thread? I know this may be time-consuming but it's the best way for me to learn.

Would really appreciate it.
 
The attached contains a table showing team stats on how many rows were reassigned.
Apologies.

This wasn't quite what I wanted. I should have been clearer.

Can you add percentages of misclassified cases by a team? It should be a percentage like the 'Data dump' tab where it says the percentage for 'original team'. There should be an extra field that says Finance etc...
 
Is this closer? A problem with percentages can be the decision 'percentage of what'. Also there is no need to have both a Power Query solution and a formula solution.

Something else that would have made your data easier to work with would be to avoid the alternating pattern of 'Original x' followed by 'Final x' and instead have two distinct tables, 'Original' and 'Final' that could be compared at a single step rather than by comparing three individual attributes.
 

Attachments

  • Antijoin alt (PB).xlsx
    24.6 KB · Views: 6
Back
Top