• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Help with Macro To Organize Gas Chromatography Instrument Data

NCase

Member
Hello,
I have multiple csv files in multiple folders which contain data from a gas chromatography instrument (see attached csv data files and a version of the data that I processed in an excel)

The raw data is organised in an terrible format where the measured analyte peaks and mass spectral library identification data are not arranged side by side. This makes manual processing of the data in excel very labor intensive
I am an including example csv files and the way I would like the data to be formatted in excel.

Is there a way to create a macro to read the csv files in the folders and organize it into a final format in attached spreadsheet (see Final Data Layout Worksheet)

I hope someone can help as I am flooded with a huge amount of useful data but its in useless format.

Best,

Francis
 

Attachments

Hui

Excel Ninja
Staff member
Francis

Have you spoken to the Gas Chromatography operators and asked is there a more user friendly output format ?

My experience with these says there is
 

NCase

Member
Hui,

It is an old instrument running on windows XP so excel macros are the only way other instrument users recommend and even the instrument company says excel is only option.

Best,

Francis
 

Hui

Excel Ninja
Staff member
Its not an impossible task, but I just don't have time to assist you until middle January '19.

Anybody else want a VBA challenge ?
 

NCase

Member
Thank you Hui. Yes to anybody else that can help with challenge. Willing to provide a reasonable paypal donation to anyone that achieve the goal.
 

Kenneth Hobson

Active Member
It is easier to help if zip is used to compress files as most have it.

Do you need the 2nd sheet's output or just the last sheet? I can look at it tonight. Looks simple enough...
 

NCase

Member
Hi,

The last sheet as an output. I created a template of what I was trying to do manually.

Here is the file in zip format for others interested in solving the problem.
 

Attachments

Deepak

Excel Ninja
Hi,

Pls find...
Feeling sleepy so I will do rest later like as data compile from folders and colors...
Do let me know for if any changes required.

Code:
Option Explicit

Sub DataProcessingV1()
Dim r As Range, iFristBlock As Integer, oFind As Range, oRnageToHide As Range, oRangeBold As Range
Dim iPasteRow As Integer, iCopyRow1 As Range, iCopyRow2 As Range, iRows As Range

Application.ScreenUpdating = False

With ActiveSheet
iFristBlock = .UsedRange.Columns.Count - 1
Set iRows = .UsedRange.SpecialCells(xlCellTypeLastCell) '.Offset(, -iFristBlock + 1)

For Each r In .Columns(1).Cells
If r.Value = "Name=" Then
    iPasteRow = r.Row + 1
        Set oFind = .Columns(1).Cells.Find("[PBM Apex]", r, , , , xlByRows)
            If Not oFind Is Nothing Then Set iCopyRow1 = oFind.Offset(-3) Else Exit Sub
        Set oFind = Nothing
       
        Set oFind = .Columns(1).Cells.Find("[contents]", iCopyRow1, , , , xlByRows)
       
            If Not oFind Is Nothing Then
                If oFind.Address = "$A$1" Then
                    Set iCopyRow2 = iRows
                Else
                    Set iCopyRow2 = oFind.Offset(-2).Offset(, iFristBlock + 2)
                End If
            Else
                Set iCopyRow2 = oFind
            End If
        Set oFind = Nothing
        Set oFind = .Range(iCopyRow1, iCopyRow2).Resize(, iFristBlock + 1)

                oFind.Copy .Cells(iPasteRow, iFristBlock + 2)
                oFind.EntireRow.Delete
       
               
               
                If Not oRangeBold Is Nothing Then
                      Set oRangeBold = Union(oRangeBold, .Rows(iPasteRow + 5), .Rows(iPasteRow + 3))
                Else
                    Set oRangeBold = Union(.Rows(iPasteRow + 5), .Rows(iPasteRow + 3))
                End If
               
               
                If Not oRnageToHide Is Nothing Then
                      Set oRnageToHide = Union(oRnageToHide, .Rows(iPasteRow + 4), .Rows(iPasteRow - 3).Resize(6))
                Else
                    Set oRnageToHide = Union(.Rows(iPasteRow + 4), .Rows(iPasteRow - 3).Resize(6))
                End If
Set iCopyRow1 = Nothing
Set iCopyRow2 = Nothing
       
End If
Next


oRnageToHide.EntireRow.Hidden = True
oRangeBold.Font.Bold = True
.Range("D:H,L:O,Q:Q,T:V").EntireColumn.Hidden = True
.Columns("A:B").ColumnWidth = 24
.Columns("C").ColumnWidth = 6
.Columns("I:K").ColumnWidth = 9
.Columns("P").ColumnWidth = 70
.Columns("R:S").ColumnWidth = 12


End With
Application.ScreenUpdating = True
End Sub
 

NCase

Member
Deepak,

This is great. The trick is now to read from each individual files in there own folders. Some data files may not have library data or no data is acquired but the file includes just empty header files. So the macro can get confused.

I am including a larger example of a typical dataset.

Thank you for your great help so far.

Best,

Francis

Hi,

Pls find...
Feeling sleepy so I will do rest later like as data compile from folders and colors...
Do let me know for if any changes required.

Code:
Option Explicit

Sub DataProcessingV1()
Dim r As Range, iFristBlock As Integer, oFind As Range, oRnageToHide As Range, oRangeBold As Range
Dim iPasteRow As Integer, iCopyRow1 As Range, iCopyRow2 As Range, iRows As Range

Application.ScreenUpdating = False

With ActiveSheet
iFristBlock = .UsedRange.Columns.Count - 1
Set iRows = .UsedRange.SpecialCells(xlCellTypeLastCell) '.Offset(, -iFristBlock + 1)

For Each r In .Columns(1).Cells
If r.Value = "Name=" Then
    iPasteRow = r.Row + 1
        Set oFind = .Columns(1).Cells.Find("[PBM Apex]", r, , , , xlByRows)
            If Not oFind Is Nothing Then Set iCopyRow1 = oFind.Offset(-3) Else Exit Sub
        Set oFind = Nothing
      
        Set oFind = .Columns(1).Cells.Find("[contents]", iCopyRow1, , , , xlByRows)
      
            If Not oFind Is Nothing Then
                If oFind.Address = "$A$1" Then
                    Set iCopyRow2 = iRows
                Else
                    Set iCopyRow2 = oFind.Offset(-2).Offset(, iFristBlock + 2)
                End If
            Else
                Set iCopyRow2 = oFind
            End If
        Set oFind = Nothing
        Set oFind = .Range(iCopyRow1, iCopyRow2).Resize(, iFristBlock + 1)

                oFind.Copy .Cells(iPasteRow, iFristBlock + 2)
                oFind.EntireRow.Delete
      
              
              
                If Not oRangeBold Is Nothing Then
                      Set oRangeBold = Union(oRangeBold, .Rows(iPasteRow + 5), .Rows(iPasteRow + 3))
                Else
                    Set oRangeBold = Union(.Rows(iPasteRow + 5), .Rows(iPasteRow + 3))
                End If
              
              
                If Not oRnageToHide Is Nothing Then
                      Set oRnageToHide = Union(oRnageToHide, .Rows(iPasteRow + 4), .Rows(iPasteRow - 3).Resize(6))
                Else
                    Set oRnageToHide = Union(.Rows(iPasteRow + 4), .Rows(iPasteRow - 3).Resize(6))
                End If
Set iCopyRow1 = Nothing
Set iCopyRow2 = Nothing
      
End If
Next


oRnageToHide.EntireRow.Hidden = True
oRangeBold.Font.Bold = True
.Range("D:H,L:O,Q:Q,T:V").EntireColumn.Hidden = True
.Columns("A:B").ColumnWidth = 24
.Columns("C").ColumnWidth = 6
.Columns("I:K").ColumnWidth = 9
.Columns("P").ColumnWidth = 70
.Columns("R:S").ColumnWidth = 12


End With
Application.ScreenUpdating = True
End Sub
 

Attachments

NCase

Member
ps colors are not essential it was done to highlight areas and how I wanted to rearrange the data.

Best,

Francis

Deepak,

This is great. The trick is now to read from each individual files in there own folders. Some data files may not have library data or no data is acquired but the file includes just empty header files. So the macro can get confused.

I am including a larger example of a typical dataset.

Thank you for your great help so far.

Best,

Francis
 

Kenneth Hobson

Active Member
I am not sure what sheet Deepak was formatting.

There are 2 approaches to get the data. One might use all arrays. Another might be to do a data query and import the files and then format and parse that.

This is a small beginning for the first method. Parts could be used for the 2nd method.

Code:
Sub Main()
  Dim p$, a, v, fso As Object, b, s$
  'Change path p to suit...
  p = "C:\Users\lenovo1\Dropbox\Excel\CSV\ParseFilesToMaster\Results.csv"
  a = Filter(Split(CreateObject("wscript.shell").exec _
    ("cmd /c Dir " & """" & p & """" & " /b /a-d /s").stdout.ReadAll, vbCrLf), ".")
  Set fso = CreateObject("Scripting.FileSystemObject")

  For Each v In a
    b = Split(fso.OpenTextFile(fn, 1, 0).ReadAll, vbLf)
  
    Exit For 'used for testing to just work on 1 file for testing.
  Next v

End Sub
 

NCase

Member
Hi Kenneth,

The raw data is in the first sheet. Sheet 2 and 3 were my step by attempts to show how I wanted to transform the data into a useful format. The color codes were to make it easy to read.

Best,

Francis
 

Chihiro

Excel Ninja
A question, did you want to completely remove those hidden rows and columns? Or keep and and just hide them?

My personal preference is to use FreeFile() and use RegEx to test pattern and add specific rows to dictionary object for later processing.

Sample code for bringing in only rows you are interested in.
Code:
Sub Demo()
Dim iFile As String: iFile = "C:\Test\Downloads\Dataprocessing\2018-11-28_0936\RESULTS.CSV"
Dim intFF As Integer: intFF = FreeFile()
Open iFile For Input As #intFF
Set dic = CreateObject("Scripting.Dictionary")
Do Until EOF(1)
    Line Input #intFF, ReadData
    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "(\[INT TIC: \d{4}-\d{2}-\d{2}_\d{4}.+?\])$|(Header=,.+)|(\d.?\=\,\s.?\d.+$)"
        If .Test(ReadData) Then
            dic(ReadData) = Split(Replace(ReadData, """", ""), ",")
        End If
    End With
Loop

Close #intFF
For Each Key In dic.keys
    Debug.Print Key
Next

End Sub
Returns.
Code:
[INT TIC: 2018-11-28_0936 BLANK SYSTEM.D\data.ms]
Header=,"Peak","R.T.","First","Max","Last","PK  TY","Height","Area","Pct Max","Pct Total"
1=,  1,  2.058,  81, 163, 188,"BV 2", 176958, 74614385,100.00, 76.695
2=,  2,  2.382,  188, 189, 197,"VV 2", 167496,  6909203,  9.26,  7.102
3=,  3,  2.575,  197, 205, 245,"VB 3", 105000, 13971177, 18.72, 14.361
4=,  4,  3.709,  290, 297, 307,"BV  ",  61950,  1791997,  2.40,  1.842
Header=,"PK","RT","Area Pct","Library/ID","Ref","CAS","Qual"
1=,  1,  2.0629,76.6953,"Carbon dioxide",    81,"000124-38-9", 4
2=,  2,  2.3780, 7.1019,"Carbon dioxide",    81,"000124-38-9", 4
3=,  3,  2.5719,14.3608,"Benzene",  1018,"000071-43-2",95
4=,  4,  3.7140, 1.8420,"Cyclotrisiloxane, hexamethyl-", 79619,"000541-05-9",90
 
Last edited:

NCase

Member
Hello,

It would be nice to bring together raw data from multiple results.csv files into sheet 1 for the record. Then process that data into the desired format into sheet 2.

Hiding or getting of rows works for me.

Best,

Francis
 

Chihiro

Excel Ninja
Try below code. I haven't fully tested, so check your result thoroughly.
Code "CheckAllSubFold" should be run with blank sheet active. ImportData is called in that sub.
Code:
Sub ImportData(iFile As String)
Dim dic As Object, ar
Dim y As Long, x As Long
Dim intFF As Integer: intFF = FreeFile()
Open iFile For Input As #intFF
y = 1
Set dic = CreateObject("Scripting.Dictionary")
Do Until EOF(1)
    Line Input #intFF, ReadData
    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "(\[INT TIC: \d{4}-\d{2}-\d{2}_\d{4}.+?data\.ms\])$|(Header=,.+)|(\d.?\=\,\s.?\d.+$)"
        If .Test(ReadData) Then
            dic(ReadData) = 1
        End If
    End With
Loop

Close #intFF
sRow = IIf(Cells(Rows.Count, "A").End(xlUp).Row = 1, 1, Cells(Rows.Count, "A").End(xlUp).Row + 1)
x = Application.RoundUp(dic.Count / 2, 0)
For Each Key In dic.Keys
    If y <= x Then
        Range("A" & sRow + y - 1) = Key
    Else
        Range("L" & sRow + y - x) = Key
    End If
    y = y + 1
Next
Set dic = Nothing
End Sub

Sub CheckAllSubFold()
Dim path As String: path = ThisWorkbook.path & "\"
Dim fName As String: fName = "RESULTS.CSV"
Dim cPath As String
cPath = Dir(path, vbDirectory)
Do While Len(cPath) > 0
    If Left(cPath, 1) <> "." And _
        (GetAttr(path & cPath) And vbDirectory) = vbDirectory Then
        ImportData path & cPath & "\" & fName
    End If
    cPath = Dir()
Loop

Range("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Comma:=True
Range("L:L").TextToColumns Destination:=Range("L1"), DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Comma:=True
Range("L:L").Delete
End Sub
 

NCase

Member
Hi Chihiro,

Can you include the two macros in excel file to test.I have tried doing this in excel 2016 and its not working for me so far.

Best,

Francis
 

NCase

Member
Deepak,

If your code could be adapted to reading my directory of results.csv files then you would have achieved the desired goal in a very elegant way.

Best,

Francis

Hi,

Pls find...
Feeling sleepy so I will do rest later like as data compile from folders and colors...
Do let me know for if any changes required.

Code:
Option Explicit

Sub DataProcessingV1()
Dim r As Range, iFristBlock As Integer, oFind As Range, oRnageToHide As Range, oRangeBold As Range
Dim iPasteRow As Integer, iCopyRow1 As Range, iCopyRow2 As Range, iRows As Range

Application.ScreenUpdating = False

With ActiveSheet
iFristBlock = .UsedRange.Columns.Count - 1
Set iRows = .UsedRange.SpecialCells(xlCellTypeLastCell) '.Offset(, -iFristBlock + 1)

For Each r In .Columns(1).Cells
If r.Value = "Name=" Then
    iPasteRow = r.Row + 1
        Set oFind = .Columns(1).Cells.Find("[PBM Apex]", r, , , , xlByRows)
            If Not oFind Is Nothing Then Set iCopyRow1 = oFind.Offset(-3) Else Exit Sub
        Set oFind = Nothing
      
        Set oFind = .Columns(1).Cells.Find("[contents]", iCopyRow1, , , , xlByRows)
      
            If Not oFind Is Nothing Then
                If oFind.Address = "$A$1" Then
                    Set iCopyRow2 = iRows
                Else
                    Set iCopyRow2 = oFind.Offset(-2).Offset(, iFristBlock + 2)
                End If
            Else
                Set iCopyRow2 = oFind
            End If
        Set oFind = Nothing
        Set oFind = .Range(iCopyRow1, iCopyRow2).Resize(, iFristBlock + 1)

                oFind.Copy .Cells(iPasteRow, iFristBlock + 2)
                oFind.EntireRow.Delete
      
              
              
                If Not oRangeBold Is Nothing Then
                      Set oRangeBold = Union(oRangeBold, .Rows(iPasteRow + 5), .Rows(iPasteRow + 3))
                Else
                    Set oRangeBold = Union(.Rows(iPasteRow + 5), .Rows(iPasteRow + 3))
                End If
              
              
                If Not oRnageToHide Is Nothing Then
                      Set oRnageToHide = Union(oRnageToHide, .Rows(iPasteRow + 4), .Rows(iPasteRow - 3).Resize(6))
                Else
                    Set oRnageToHide = Union(.Rows(iPasteRow + 4), .Rows(iPasteRow - 3).Resize(6))
                End If
Set iCopyRow1 = Nothing
Set iCopyRow2 = Nothing
      
End If
Next


oRnageToHide.EntireRow.Hidden = True
oRangeBold.Font.Bold = True
.Range("D:H,L:O,Q:Q,T:V").EntireColumn.Hidden = True
.Columns("A:B").ColumnWidth = 24
.Columns("C").ColumnWidth = 6
.Columns("I:K").ColumnWidth = 9
.Columns("P").ColumnWidth = 70
.Columns("R:S").ColumnWidth = 12


End With
Application.ScreenUpdating = True
End Sub
 

Chihiro

Excel Ninja
Here, extract the attached and run the code in Sample_Master.xlsm.

Note that master should be in the folder containing all the sub-folders with csv files (same location where you had compiled sample).

If you need master to be elsewhere, I'd suggest using FileDialog to pick a folder containing all the sub-folders.
 

Attachments

NCase

Member
Hi Chihiro,

I know you don't reply on weekends but here is my update on the macro so far

The macro is working at a 95% success rate but it missed 3 files in the directory for some unknown reason.

I am including the raw files, the macro with the data process and the list of files it missed. I cannot see an obvious reason why the macro missed them. It might be useful to have a list of the folder names processed in a separate sheet to make sure its captured and processed all the required files in the data directory.

Other than that issue its working perfectly. Thank you so much for your help and have a great weekend.

Best,

Francis

Here, extract the attached and run the code in Sample_Master.xlsm.

Note that master should be in the folder containing all the sub-folders with csv files (same location where you had compiled sample).

If you need master to be elsewhere, I'd suggest using FileDialog to pick a folder containing all the sub-folders.
 

Attachments

NCase

Member
Hi Chihiro,

Another final chink in my dataset. It turns out the results.csv file sometimes has a variable name. Is it possible to just process any .csv file in the folder and not worry about its exact name.

Have a great weekend.

Best,

Francis
 

Chihiro

Excel Ninja
Here. Use this code then. It will grab info from all 67 files in your sample (even if file name is different from RESULTS.CSV, as long as it is csv file).

However, if there are other patterns that could occur, you'll need to adjust.

Only changes made are. ".Pattern" line and additional logic to store folder names and using that to grab file names stored in each folders dynamically in "CheckAllSubFold" sub (using collection and additional loop over the collection).
Code:
Sub ImportData(iFile As String)
Dim dic As Object, ar
Dim y As Long, x As Long
Dim intFF As Integer: intFF = FreeFile()
Open iFile For Input As #intFF
y = 1
Set dic = CreateObject("Scripting.Dictionary")
Do Until EOF(1)
    Line Input #intFF, ReadData
    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "(\[INT.+?: \d{4}-\d{2}-\d{2}_\d{4}.+?data\.ms\])$|(Header=,.+)|(\d.?\=\,\s.?\d.+$)"
        If .Test(ReadData) Then
            dic(ReadData) = 1
        End If
    End With
Loop

Close #intFF
sRow = IIf(Cells(Rows.Count, "A").End(xlUp).Row = 1, 1, Cells(Rows.Count, "A").End(xlUp).Row + 1)
x = Application.RoundUp(dic.Count / 2, 0)
For Each Key In dic.Keys
    If y <= x Then
        Range("A" & sRow + y - 1) = Key
    Else
        Range("L" & sRow + y - x) = Key
    End If
    y = y + 1
Next
Set dic = Nothing
End Sub

Sub CheckAllSubFold()
Dim path As String: path = ThisWorkbook.path & "\"
Dim fName As String: fName = "*.CSV"
Dim cPath As String, coll As New Collection
cPath = Dir(path, vbDirectory)
Do While Len(cPath) > 0
    If Left(cPath, 1) <> "." And _
        (GetAttr(path & cPath) And vbDirectory) = vbDirectory Then
        coll.Add path & cPath & "\"
    End If
    cPath = Dir()
Loop
For I = 1 To coll.Count
    fName = Dir(coll.Item(I) & "*.csv")
    ImportData coll.Item(I) & fName
    fName = Dir()
Next
Range("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Comma:=True
Range("L:L").TextToColumns Destination:=Range("L1"), DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Comma:=True
Range("L:L").Delete
End Sub
 

NCase

Member
Hi Chihiro,

I keep getting the error "Path not found (Error 76)" when I try to run the code.

Best,

Francis
 
Top