• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Matrix to List (DataBase)

PaulF

Active Member
Hello all...

Please see attached. I needed to shape this large matrix dataset 366X441 to a list style DB for some pivot table work. It takes about 25 seconds and I'm still early in my VBA learning.

I felt that 25 seconds was pretty fast for this, but the manager that needs this did not share my sentiments. Any thoughts ?? I'm open to any comments and/or help.

Respectfully,
Paul F
 

Attachments

  • Matrix2List1.xlsm
    739.6 KB · Views: 12
Hi, PaulF!
In the meanwhile that someone arrives with a better idea, you can set up something that looks like an accident for your boss. Unless you can talk to his boss, convince him to fire your's and promote you as his successor.
Regards!
 
Hi, PaulF!

A couple of doubts. Let us take row 6 as example:

a) Columns D:L of 1st sheet are the same as A:I of 2nd sheet. J column of 2nd is contains a row entry for:
- each column in 1st worksheet from cols M:MN?
- each column in 1st worksheet from cols M:MN for non-blank cells in row? (6 in this case)

b) What about columns NP:OA which appear to be shifted 2 cols right and 1 row down?

Regards!
 
Hi, PaulF!

A couple of doubts. Let us take row 6 as example:

a) Columns D:L of 1st sheet are the same as A:I of 2nd sheet. J column of 2nd is contains a row entry for:
- each column in 1st worksheet from cols M:MN?
- each column in 1st worksheet from cols M:MN for non-blank cells in row? (6 in this case)

b) What about columns NP:OA which appear to be shifted 2 cols right and 1 row down?

Regards!

NP to OA was an error paste... (not supposed to be there and did not effect anything)

I just checked the 1st 130 lines of the output and it all looks correct...
Am I missing something ??
 
Hi, PaulF!
Continuing with row 6, columns H:I,
Start Date End Date
20/02/2017 11/03/2017
but there's only data from dates 20/02/2017 to 09/03/2017, not 11th, besides a few inner blanks. Is that correct or I should not even look at it?
Regards!
 
Hi, PaulF!
Continuing with row 6, columns H:I,
Start Date End Date
20/02/2017 11/03/2017
but there's only data from dates 20/02/2017 to 09/03/2017, not 11th, besides a few inner blanks. Is that correct or I should not even look at it?
Regards!
I understand the confusion now sir :) This data is from a project control group and each of the skill crafts come in from a release form with various line items. Start Date, End Date, days per week, qty (no of resources) and days per week... If it was (start date) 1/1/17 to (end date) 1/14/17 - 4 days per week and you look to the right under the days you would see a qty (resource count) for 4 days per week (mon - thurs) so 1/2, 1/3, 1/4, 1/5 would show qty as would 1/9, 1/10, 1/11, 1/12 for the 2nd week.

That make more sense now ??

Sorry I was not more clear to begin sir...

Respectfully,
PaulF
 
Hi, PaulF!
Thanks for the clarification, don't need to worry about that. If I don't come back to you in a couple of hours I'll do on Monday.
Regards!
 
Sheet 1 is the matrix with Jan 1 to Dec 31 as column headers and skills, vendor, etc on left...

Goto Sheet2 and run macro Matrix2List
 
Hi, PaulF!

Give a look at the uploaded file.

I've added a few dynamic named ranges for easily referencing, a 3rd worksheet for copying/filtering/copying, a column with the no. of dates with qty in 1s sheet, and the number of days of the year to handle leap years.

The main change is that I copy by blocks of data per line and dates with qty and not in a cell by cell basis, there's where you can reduce dramatically the running time.

Your code of Matrix2List run for 1'1" at my machine (heavily loaded), the new code of below MatrixReloaded2List run in 12", so it's a 20% of the original time, in your case it should be 5".

Just advise if any issue or anything that you don't understand.

I added an Option Explicit clause (home rules, sorry), so I added a few declarations that were missing in your code.

Code:
Sub MatrixReloaded2List()
Debug.Print Now(), "start"
    '
    ' constants
    '  ws & ranges
    Const ksWSSource = "Sheet1"
    Const ksWSTarget = "Sheet2"
    Const ksWSWork = "Hoja1"
    Const ksDataBody = "DataBodyTable"
    Const ksDataCount = "DataCountList"
    Const ksDatesQty = "DatesQtyTable"
    Const ksWork = "WorkTable"
    '  filter
    Const kiFilter = 2
    Const ksCriteria = ">0"
    '
    ' declarations
    Dim wsS As Worksheet, wsT As Worksheet, wsW As Worksheet
    Dim rngDB As Range, rngDC As Range, rngDQ As Range, rngW As Range
    Dim rngDBRow As Range, rngDQHeader As Range, rngDQRow As Range
    Dim lReg As Long, iDates As Integer
    Dim I As Long, J As Long
    '
    ' start
    '  environment
    With Application
        .ScreenUpdating = False
        .Calculation = xlCalculationManual
        .EnableEvents = False
    End With
    '  ws & ranges
    With ThisWorkbook
        Set wsS = .Worksheets(ksWSSource)
        Set wsT = .Worksheets(ksWSTarget)
        Set wsW = .Worksheets(ksWSWork)
    End With
    With wsS
        Set rngDB = .Range(ksDataBody)
        Set rngDC = .Range(ksDataCount)
        Set rngDQ = .Range(ksDatesQty)
        Set rngDQHeader = rngDQ.Rows(1).Offset(-1, 0)
    End With
    Set rngW = wsW.Range(ksWork)
    With rngW
        .ClearContents
        rngDQHeader.Copy
        .Cells(1, 1).PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:=False, Transpose:=True
    End With
    '  counter
    lReg = 0
    '
    ' process
    With rngDB
        For I = 1 To .Rows.Count
            ' no. of dates
            iDates = rngDC.Cells(I, 1).Value
            If iDates > 0 Then
                ' body range
                Set rngDBRow = .Rows(I).Cells
                ' copy body
                rngDBRow.Copy Range(wsT.Cells(lReg + 2, 1), wsT.Cells(lReg + iDates + 1, .Columns.Count))
                ' dates range
                lReg = lReg + iDates
                Set rngDQRow = rngDQ.Rows(I).Cells
                ' which dates?
                '  work data
                rngW.Columns(2).ClearContents
                rngDQRow.Copy
                rngW.Columns(2).PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:=False, Transpose:=True
                '  copy filtered
                CopyingFilteredCells ThisWorkbook, ksWSTarget, ksWSWork, kiFilter, ksCriteria, ksCriteria, 1, .Columns.Count + 1
            End If
        Next I
        Application.CutCopyMode = False
    End With
    '
    ' end
    '  environment
    With Application
        .ScreenUpdating = True
        .Calculation = xlCalculationAutomatic
        .EnableEvents = True
    End With
    '  reposition
    With rngW
        .Parent.Activate
        .ClearContents
        .Cells(1, 1).Offset(-1, .Columns.Count).Select
    End With
    With wsT
        .Activate
        .Cells(1, rngDB.Columns.Count + 1).Select
    End With
    '  beep
    MarcLBeepDemo
    MsgBox CStr(lReg) & " record entries have been created at worksheet " & ksWSTarget, _
        vbOKOnly + vbInformation, "Summary"
    '  ws & ranges
    Set rngDB = Nothing
    Set rngDC = Nothing
    Set rngDQ = Nothing
    Set rngW = Nothing
    Set rngDBRow = Nothing
    Set rngDQHeader = Nothing
    Set rngDQRow = Nothing
    Set wsS = Nothing
    Set wsT = Nothing
    Set wsW = Nothing
    '
Debug.Print Now(), "end"
End Sub

Private Sub CopyingFilteredCells(pwbTgt As Workbook, psTgt As String, psSrc As String, _
                                piFilter As Integer, psCriteria1 As String, psCriteria2 As String, _
                                Optional pvColSrc As Variant, Optional pvColTgt As Variant)
    ' constants
    ' declarations
    Dim lRow As Long, rng As Range
    ' start
    Application.DisplayAlerts = False
    If IsMissing(pvColSrc) Then pvColSrc = 1
    If IsMissing(pvColTgt) Then pvColTgt = 1
    ' process
    Set rng = Worksheets(psTgt).Cells(1, pvColTgt)
    lRow = rng.End(xlDown).End(xlDown).End(xlUp).Row
    With Worksheets(psSrc)
        If piFilter > 0 Then .Cells.AutoFilter Field:=piFilter, Criteria1:=psCriteria1, Operator:=xlAnd, Criteria2:=psCriteria2
        If .[A1].Offset(1, 0).Value <> "" And .[A1].Offset(1, 0).End(xlDown).Value <> "" Then _
            Range(.[A1].Offset(1, 0), .[A1].Offset(1, pvColSrc - 1).End(xlDown)).Copy pwbTgt.Worksheets(psTgt).Cells(lRow + 1, pvColTgt)
        If piFilter > 0 Then .ShowAllData
    End With
    ' end
    Set rng = Nothing
    Application.DisplayAlerts = True
End Sub
Regards!
 

Attachments

  • Matrix2List1.xlsm
    782.8 KB · Views: 13
SirJB7,

Thank you !! With little training on my end and I'm going through your code and love the structure... In trying to learn can you help me understand the naming conventions.

  • in Const ws & ranges - ks << why ks ?
  • in Const filter - ki << why ki ??

I'm going to spend a few hours learning from your code and I might have a question or a few if you don't mind...

Respectfully,
PaulF
 
In trying to learn can you help me understand the naming conventions.

  • in Const ws & ranges - ks << why ks ?
  • in Const filter - ki << why ki ??
Hi, PaulF!

Thanks for your kind words, hope the new times would be fine for your boss, otherwise tell him to give me a call. :DD

The notation stuff. Once upon a time...

All this began when in the early 80's the personal computer come to this world and users wanted to spreadsheet, even if they didn't know exactly what it was. Before that there was only @Hui tapping with Visicalc & Multiplan on CP/M systems.

PC's came come with interpreted Basic (beginner's all-purpose symbolic instruction code) in ROM (i.e, chipped) as only available language. Ok, assembler too but every computer have always had it.

Soon, in '83 Lotus 1-2-3 appeared, it had macros but not a real programming language, then Quattro occupied the throne for a bit and then MS came out with Excel 2.0, '87 I think, the funny thing is that's never been a 1.0. But it was until early 90's that Basic joined worksheets with Excel 5.0 in the form of VBA.

From the earliest times of the interpreted versions Basic language supported suffixes for explicitly defining types of variables:
% real, & long, @ decimal, ! single, # double, $ string
https://docs.microsoft.com/en-us/do.../language-features/data-types/type-characters
There was no suffix for other data types like boolean, date and others.

Programmers didn't much agree with this cryptic suffix identification system and a guy from Xerox came out with the Hungarian Notation:
https://en.wikipedia.org/wiki/Hungarian_notation
There are many variations of this but mainly it dedicated generally 3 letters as prefix to identify variables and objects:
int for integer, lng for long, str for string, dbl for double, dec for decimal, dat for date, bol for boolean and so on.

Then someone said "hey, what to do with control names?", as the idea was that looking only at the code one would find out what was that name/word representing or referencing, and Redmond guys always ready for helping the developer's community suggested this:
https://msdn.microsoft.com/en-us/library/aa263493(v=vs.60).aspx
txt for text box, cbo for combo, lst for list...
So if you happened to find something like:
txtSomething.Text=strAnotherThing
you could easily realize that the string variable representing AnotherThing was being assigned to the text control that held the value of Something.

In my particular case, I joined immediately both Hungarian and Redmondarian suggested conventions as I was tired/bored/displeased when having to modify not only other people code but older mine's too! What would I have intended to mean with this?:
A.Caption=S
WTF were A & S? Maybe the week when I wrote it I could remember that, but surely not next month.

For serious projects (yes, not all projects are serious, come on!) or when there are client requirements I stick to the 3+3 notation conventions and I always assign significant names.

For projects where code is short (Chandoo's posts) I use a personal shrinked variation of them:
a) 1 letter prefix for variables (s string, i integer, l long, d date/double -dt date, db double, if both-, b boolean, v for variant...)
b) 3 letters prefix for objects (txt text box, rng range, ws worksheet, wb workbook...)
c) 1 pre-prefix letter for constants (yeah, my rule, not in both below notations): k
ki for integer, ks for string, kd for date...
d) 1 pre-prefix for global (used at module level instead of procedure level): g
gi for integer, gs for string, gki for global integer constant, gks for global string constants...
e) 1 pre-prefix for parameters (as arguments definitions): p
pi for integer, ps for string, ...

So if I (or you or the person who dares to accept sample codes from me and even more dangerously to use them) find something like:
gsNounDenotingSomething = psNounComingFromWhoKnows
iBottle=gkiBottlesAsDefault
then there is a string parameter being stored in a global string variable in the 1st example, and a global integer constant being assigned to an integer variable.

Hope it helps. Feel free to ask anything else... except about the prehistorical ages of Visicalc/Multiplan... you yet know whom to address to! :p

Regards!
 
For data transformation, PowerQuery is best tool as dan_l pointed out.

However, when you don't have it available (or in rare instances where original data contain too many exceptions for PQ to effectively handle). Fastest method is to use Scripting.Dictionary and Array objects.

Code:
Sub Demo()
Dim srcArr, resArr, Key
Dim i As Long, j As Long
Dim kStr As String
Dim sTime As Single, eTime As Double
sTime = Timer
With Sheet1
    srcArr = .Range(.Cells(5, "D"), .Cells(.UsedRange.Rows.Count, .UsedRange.Columns.Count + 3)).Value
End With

With CreateObject("Scripting.Dictionary")
    For i = 2 To UBound(srcArr)
        For j = 1 To 9
            kStr = IIf(Len(kStr) = 0, srcArr(i, j), kStr & "^" & srcArr(i, j))
        Next
        For j = 10 To UBound(srcArr, 2)
            If srcArr(i, 7) = srcArr(i, j) Then
                .Item(kStr & "^" & srcArr(1, j)) = 1
            End If
        Next
            kStr = ""
    Next
    ReDim resArr(1 To .Count, 1 To 10)
    i = 1
    For Each Key In .Keys
        x = Split(Key, "^")
        For j = 0 To UBound(x)
            resArr(i, j + 1) = x(j)
        Next
        i = i + 1
    Next
End With
Sheet2.Range("A1:I1").Value = Sheet1.Range("D5:L5").Value
Sheet2.Range("J1") = "Date"
Sheet2.Range("A2").Resize(UBound(resArr), 10) = resArr
eTime = Timer
Debug.Print eTime - sTime
End Sub

Basic concept. Put source data range into array to be read from and manipulated. Dictionary is used to hold unique records (i.e. concatenated string of Column D to L, and the date).

Concatenation is done using "^" as delimiter, since that character does not occur in the data.

Create result array based on dictionary count (variable) and result column# of 10 (fixed).
Split each Key of dictionary and put it into result array.

Put it back to range in one shot.

This should take less than 1 sec to run (I'd say 0.2~0.5 sec depending on machine spec).

See attached sample.
 

Attachments

  • Matrix2List1.xlsb
    282.9 KB · Views: 18
Hi, PaulF!
If I were you and I had Excel 2010 in advance, I'd go for Chihiro approach. Nice one, I didn't think on it.
Regards!
 
For data transformation, PowerQuery is best tool as dan_l pointed out.

However, when you don't have it available (or in rare instances where original data contain too many exceptions for PQ to effectively handle). Fastest method is to use Scripting.Dictionary and Array objects.

Basic concept. Put source data range into array to be read from and manipulated. Dictionary is used to hold unique records (i.e. concatenated string of Column D to L, and the date).

Concatenation is done using "^" as delimiter, since that character does not occur in the data.

Create result array based on dictionary count (variable) and result column# of 10 (fixed).
Split each Key of dictionary and put it into result array.

Put it back to range in one shot.

This should take less than 1 sec to run (I'd say 0.2~0.5 sec depending on machine spec).

See attached sample.

HOLY SCHNIKEYS BATMAN !! I have sooooo much to learn... thank you...
 
For data transformation, PowerQuery is best tool as dan_l pointed out.

However, when you don't have it available (or in rare instances where original data contain too many exceptions for PQ to effectively handle). Fastest method is to use Scripting.Dictionary and Array objects.

Basic concept. Put source data range into array to be read from and manipulated. Dictionary is used to hold unique records (i.e. concatenated string of Column D to L, and the date).

Concatenation is done using "^" as delimiter, since that character does not occur in the data.

Create result array based on dictionary count (variable) and result column# of 10 (fixed).
Split each Key of dictionary and put it into result array.

Put it back to range in one shot.

This should take less than 1 sec to run (I'd say 0.2~0.5 sec depending on machine spec).

See attached sample.

Run lightning fast... There are 8,288 entries in the 1/1 to 1/31 matrix, but the results of your script is 8,276

I'm looking at the differences now to see what I determine...
 
From quick analysis following case presents issue.

In your sheet1: Row 175 & 177 is identical. There's absolutely nothing that differentiate them other than position in data.
Row 361 & 362 is identical.

So, with identical information, dictionary isn't able to differentiate between the two and only keeps one set of data.

Are these identical rows intended? If so, it will require slight change in code logic to accommodate it.

EDIT: FYI there are more identical rows.
 
Last edited:
From quick analysis following case presents issue.

In your sheet1: Row 175 & 177 is identical. There's absolutely nothing that differentiate them other than position in data.
Row 361 & 362 is identical.

So, with identical information, dictionary isn't able to differentiate between the two and only keeps one set of data.

Are these identical rows intended? If so, it will require slight change in code logic to accommodate it.

EDIT: FYI there are more identical rows.

Good clean data upfront is the key... I need to add some checking / data cleansing with the project and the client.

Thank you again Chihiro !!
 
Back
Top