• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Compare multiple columns between files

I often use VLOOKUP to compare data in a column of one file to data in a column in another file. But is there a way (there's ALWAYS a way) to compare multiple columns at a time? In essence, I want to see if the entire row in file A is the same as any row in file B. I want to know if there are any new rows in file B.

Example:
Both files have same format:
First Name, Last Name, Date Hired, Position, more columns of data

I get a new file every week.
Assumption: First Name, Last Name, Date Hired combination will always be unique
I need to know:
(a) Are there any new employees (new combination of first three columns)?
(b) Were any employees terminated (combination of first three columns is not on new file)?
(c) Did ANYTHING change (I don't care WHAT changed, just if ANYTHING changed)?
 
Unfortunately, I am not seeing where either of these will help me. These seem to compare single cell values. I need to compare the entire row from one spreadsheet to all the rows in another spreadsheet. I have over 100,000 rows in each spreadsheet, so if I can automate this, it would really help.
 
Ronald

The second link explains how to compare part or whole columns, not just single cells with CF. The first link shows how to compare part/full with a formula that you fill down as far as needed. Alter the sheet reference as needed to suit.

Can you explain why you do not understand.
 
Hi, Ronald Wilson!
An innocent comment about your identification method. You say 100K rows but what if 2 or more John Smith were hired the same day? If I were you I'd include another field, like personal ID document, date of birth, ...
Regards!
 
The second link (in Chandoo.org) discusses using Conditional Formatting. I tried that and could not get the desired results. Is there another link within all the comments that might help more? I'm still not finding out how to do that comparison.

As per SirJB7's comment, I agree that the data needs to be unique. My example has nothing to do with my real data.
 
Hi, Ronald Wilson!

Give a look to the uploaded files: 1 is your actual workbook, 2 is your new workbook. Both are .xlsm files but just the 1st one needs to, but the author is so lazy that simply copied it.

This is the code:
Code:
Option Explicit
 
Sub IAmJohnSmithAndMyFriendIsJamesJones()
    '
    ' constants
    '  files
    Const ksWS1 = "Hoja1"
    Const ksData1 = "DataTable"
    Const ksWB2 = "Compare multiple columns between files - 2 (for Ronald Wilson at chandoo.org).xlsm"
    Const ksWS2 = "Hoja1"
    Const ksData2 = "DataTable"
    Const ksSummary = "Summary"
    Const ksSortKeys = "@,A,B,C"
    '  texts
    Const ksComma = ","
    Const ksColon = ":"
    Const ksSeparator = "_"
    Const ksEOF = "zzz"
    Const ksAdded = "Added"
    Const ksUpdated = "Updated"
    Const ksDeleted = "Deleted"
    '
    ' declarations
    Dim rng(2) As Range
    Dim iSummary(2) As Integer, lIndex(2) As Long, sString(2) As String
    Dim vSortKeys As Variant
    Dim I As Long, J As Integer
    '
    ' start
    '  2nd workbook
    With Workbooks
        For I = 1 To .Count
            If Workbooks(I).Name = ksWB2 Then Exit For
        Next I
        If I > .Count Then
            .Open ActiveWorkbook.Path & Application.PathSeparator & ksWB2
            ThisWorkbook.Activate
        End If
    End With
    '  ranges
    Set rng(1) = ActiveWorkbook.Worksheets(ksWS1).Range(ksData1)
    Set rng(2) = Workbooks(ksWB2).Worksheets(ksWS2).Range(ksData2)
    If rng(1) Is Nothing Or rng(2) Is Nothing Then GoTo IAmJohnSmithAndMyFriendIsJamesJones_Exit
    '  application
    With Application
        .ScreenUpdating = False
        .EnableEvents = False
        .Calculation = xlCalculationManual
    End With
    '  initialize
    For I = 1 To 2
        With rng(I)
            For J = 1 To .Columns.Count
                If .Cells(0, J).Value = ksSummary Then Exit For
            Next J
            If J > .Columns.Count Then
                .Cells(0, J).Value = ksSummary
                Set rng(I) = .Resize(.Rows.Count, .Columns.Count + 1)
            End If
            iSummary(I) = J
            Range(.Cells(1, iSummary(I)), .Cells(.Rows.Count, iSummary(I))).ClearContents
        End With
    Next I
    '  keys
    vSortKeys = Split(ksSortKeys, ksComma)
    '
    ' process
    '  sort, if not
    For I = 1 To 2
        With rng(I)
            With .Parent.Sort
                With .SortFields
                    .Clear
                    For J = 1 To UBound(vSortKeys)
                        .Add Key:=Range(vSortKeys(J) & ksColon & vSortKeys(J)), _
                            SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
                    Next J
                End With
                .SetRange rng(I).Parent.Cells
                .Header = xlYes
                .MatchCase = False
                .Orientation = xlTopToBottom
                .SortMethod = xlPinYin
                .Apply
            End With
        .Cells(1, 1).Select
        End With
        Workbooks(ksWB2).Activate
    Next I
    ThisWorkbook.Activate
    '  cycle
    lIndex(1) = 1
    lIndex(2) = 1
    Do Until lIndex(1) > rng(1).Rows.Count And lIndex(2) > rng(2).Rows.Count
        ' build control strings
        For I = 1 To 2
            With rng(I)
                If lIndex(I) <= .Rows.Count Then
                    sString(I) = .Cells(lIndex(I), 1).Value & ksSeparator & _
                                .Cells(lIndex(I), 2).Value & ksSeparator & _
                                .Cells(lIndex(I), 3).Value & ksSeparator
                Else
                    sString(I) = ksEOF
                End If
            End With
        Next I
        ' compare
        Select Case sString(1)
            Case Is < sString(2)
                rng(1).Cells(lIndex(1), iSummary(1)).Value = ksDeleted
                lIndex(1) = lIndex(1) + 1
            Case Is = sString(2)
                For I = 1 To rng(1).Columns.Count - 1
                    If rng(1).Cells(lIndex(1), I).Value <> rng(2).Cells(lIndex(2), I).Value Then
                        rng(1).Cells(lIndex(1), iSummary(1)).Value = ksUpdated
                        rng(2).Cells(lIndex(2), iSummary(2)).Value = ksUpdated
                        Exit For
                    End If
                Next I
                lIndex(1) = lIndex(1) + 1
                lIndex(2) = lIndex(2) + 1
            Case Is > sString(2)
                rng(2).Cells(lIndex(2), iSummary(2)).Value = ksAdded
                lIndex(2) = lIndex(2) + 1
        End Select
    Loop
    '
    ' end
    '  application
    With Application
        .ScreenUpdating = True
        .EnableEvents = True
        .Calculation = xlCalculationAutomatic
    End With
IAmJohnSmithAndMyFriendIsJamesJones_Exit:
    '  ranges
    Set rng(2) = Nothing
    Set rng(1) = Nothing
    Beep
    '
End Sub

My assumptions, constraints and/or tips are:
a) Both workbooks are in the same folder (if not, fully qualify the path for the 2nd one and change the opening reference).
b) Both workbooks have a worksheet (no matter the name) that contains a named range (dynamically preferably) with the same columns quantity and distribution.
c) Only workbook 1 has VBA code.
d) In the constants / files section of the code you define the location parameters (2nd workbook, worksheets of both, ranges of both) that don't need to be equal
e) The constant ksSortKeys contains the columns to order the data sets in ascending order comma separated (1st entry is discarded -@-, others are column letters)
f) The result is placed in each workbook in a column at the end of the ranges called by default "Summary" (change it in ksSummary constant), and if it doesn't exist it's created.
g) Result texts are placed in 3 constants which I assume you'll manage to find and eventually change.
h) The last but most important: I love the macro name, and hope you too.

Just advise if any issue.

Regards!
 

Attachments

  • Compare multiple columns between files - 1 (for Ronald Wilson at chandoo.org).xlsm
    25.6 KB · Views: 1
  • Compare multiple columns between files - 2 (for Ronald Wilson at chandoo.org).xlsm
    9.4 KB · Views: 1
Hi Ronald,

I would think of 2 possible ways to solve (hopefully!) your problem (without using VBA):
The first solution would be to concatenate First Name, Last Name and Date Hired in both files to create unique entries, and to compare them with a Vlookup. If you get errors in the new file, you know that one of the names or date hired changed, or that there are new entries. Same logic for the old file.

The second formula based solution would be to use a Sumproduct in the following way:
=SUMPRODUCT((FirstNameRange=First Name)*(LastNameRange=LastName)*(DateHiredRange=DateHired)). FirstNameRange, LastName Range and DateHiredRange being the whole columns in the old file, and First Name, Last Name and Date Hired are the look up values in your new file (if you compare the new file to the old file).
The formula returns a 1 if it finds a match and a 0 if there is no match.
It is a kind of Vlookup based on 3 criteria. The logic is the same than above, but without helper column.

If you upload an example, I could test both formulas and send you the result.

Regards
 
@Aurelie
Hi!
Theoretically at least both methods are perfect and suitable for the job. The only warning I see it's that he's talking about 100K rows in each file.
Regards!
 
@Aurelie
Hi!
Of course I'd do it if I were Ronald Wilson, I just chose the VBA way regarding data volume and speed.
Regards!
 
I'll give these a try. I've done the "trick" of concatenating fields before doing a VLOOKUP, but since I have 12 columns of data, I was really trying to avoid that. The SUMPRODUCT seems a bit cumbersome for the same reason. But I never thought there would be a simple answer, so I'll give all of these a try, including the VBA code.

Thanks!
 
Not yet. Unfortunately, this is not as high a priority as much of the rest of my work, so I can't spend as much time as I'd like on it. I'll try to get to it in the next few days or maybe this weekend. Thanks for the time you've spent on it.
 
I've created two subset files and run the VBA code against them. It looks like it worked well. I have a column for the file date, so I had to take that out or every row would be appear updated. I'm also going to "tweak" it some more so the file names and sheet names can be variable. They currently contain the date and that changes every week. I'm also going to save it to my Personal workbook so it will always be active. I don't want to have to add it to me file every week.

Thanks,
Ronnie
 
Hi, Ronald Wilson!
Files and worksheet names have always been variable since the 1st version, just need to change their values in the constants section. And if they can't be parametrically customized, well, replace the constants definition by the proper declaration (Dim) and add a manual input, a retrieval from a parameters range or worksheet, and the rest of the code will still do the job.
Regards!
 
Back
Top