ADO connection CSV issue with mixed data type

Chihiro · Apr 20, 2017

I've been playing with ADO as of late. And encountered an issue.

Normally when I use ADO to bring in data from closed workbook (.xlsx, .xlsm, .xlsb etc).
I use "IMEX=1;ImportMixedTypes=Text;" and it can bring in fields with mixed data types without loss of data.

However, when I use same options for CSV files. ADO seem to guess the data type and will end up with data loss. I have also tried various values for "typeguessrows" for ACE registry key with no difference in result.

Is there different option that can be set to work around this issue?
Currently working around it by converting csv to xlsx file before doing the query. But would like to know if there's another solution.

Code sample: (sample csv files in zip attached)
When you run it, you will see that third row is missing data (SKU = 1000C). As well as the first row (header).

Code:

Sub UpdateSKU()
Dim ws As Worksheet
Dim fPath As String: fPath = "C:\Test\" 'Change as needed
Dim cn As Object, rst As Object

Set cn = CreateObject("ADODB.Connection")
With cn
    .Provider = "Microsoft.ACE.OLEDB.12.0"
    .ConnectionString = "Data Source=" & fPath & ";" & _
                        "Extended Properties=""text;HDR=NO;FMT=Delimited;Imex=1;ImportMixedTypes=Text;"""
    .CursorLocation = 3
    .Open
End With

strQuery = "SELECT t2.[F1], t1.[F2], t1.[F3], t1.[F4], t1.[F5] From [Test2.csv] as t1 " & _
            "RIGHT OUTER JOIN [Test1.csv] as t2 " & _
            " ON t2.[F1] = t1.[F1];"

Set rst = CreateObject("ADODB.Recordset")
rst.Open strQuery, cn, 1, 3

Range("A1").CopyFromRecordset rst
rst.Close
cn.Close

End Sub

SirJB7 · Apr 20, 2017

Hi, Chihiro!

If you look inside the .csv files with standard Windows Notepad you'll see a "normal" text file with fields separated by commas. But it you open them with Notepad++ you'll notice that they look a bit different.

I guess that you have header columns with Alt-Enter to properly format titles, but this in fact generates a problem when saving as .csv.

I'm copying as it's seen from both text editors:

Standard Windows Notepad:
-----
SKU,Product #,"Weight (lbs.)02/09/2017","Unit of Measure02/09/2017","WHS Location02/09/2017"
1000,20329,0.48,EA,II25A2
1000C,20330,4.8,BX,II24A2
1001,20149,0.48,EA,JJ20A2
-----

Notepad++:
-----
SKU,Product #,"Weight (lbs.)
02/09/2017","Unit of Measure
02/09/2017","WHS Location
02/09/2017"
1000,20329,0.48,EA,II25A2
1000C,20330,4.8,BX,II24A2
1001,20149,0.48,EA,JJ20A2
-----

Actually if you manually copy & paste from regular Notepad the output will reflect as that of Notepad++.

Maybe the line breaks before the 3 dates in titles are the responsible of the wrong ADO behavior?

Give a look at it replacing Alt-Enter by proper spacing.

Regards!

Chihiro · Apr 21, 2017

Thanks for looking into this. I didn't notice the Line Change code before.

Unfortunately, removing them made no change to the outcome.

Will continue to look and update if I find any solution.

Regards,

shrivallabha · Apr 22, 2017

I don't think line feed chars are an issue as the structure remains consistent. If new line is required in csv files (posted ones) then it is always surrounded by pair of double quotes (") indicating the row input is not finished.

Edit:
Surely it is some setting in connection string as it seems to drop conflicting data type. I tried by modifying the sql statement to see what data it is pulling from test2.csv and it dropped where data type was in conflict.

Code:

strQuery = "SELECT * From [Test2.csv]"

Edit2:
Your issue seems to be similar to what was being discussed in this thread:
https://www.experts-exchange.com/qu...hen-Number-Data-is-Present-in-Other-Rows.html
Need to create schema file to coerce data type.

Chihiro · Apr 24, 2017

Much obliged Shrivallabha. Ini did the trick.
In hind sight this makes perfect sense as XML based Excel files naturally contain schema in their file structure (which can be brought in using Connection.OpenSchema). But text files do not.
Edit: Binary file (.xlsb) also seems to contain Schema info within it.

As sample following is what I used.

Code:

[Test1.csv]
ColNameHeader=False
MaxScanRows=0
Col1=F1 Char
Col2=F2 Char
Col3=F3 Char
Col4=F4 Char
Col5=F5 Char

[Test2.csv]
ColNameHeader=False
MaxScanRows=0
Col1=F1 Char
Col2=F2 Char
Col3=F3 Char
Col4=F4 Char
Col5=F5 Char

shrivallabha · Apr 24, 2017

Chihiro said:
Much obliged Shrivallabha. Ini did the trick.
In hind sight this makes perfect sense as XML based Excel files naturally contain schema in their file structure (which can be brought in using Connection.OpenSchema). But text files do not.
Edit: Binary file (.xlsb) also seems to contain Schema info within it.

As sample following is what I used.

Code:

[Test1.csv] ColNameHeader=False MaxScanRows=0 Col1=F1 Char Col2=F2 Char Col3=F3 Char Col4=F4 Char Col5=F5 Char [Test2.csv] ColNameHeader=False MaxScanRows=0 Col1=F1 Char Col2=F2 Char Col3=F3 Char Col4=F4 Char Col5=F5 Char

True, there's no scope in flat files for handling data types. So if filenames and data structure is going to remain constant then you can use schema.ini setup else converting to Excel (as you are doing now) will be better option.

Marc L · Apr 24, 2017

Hi !

Chihiro, thanks to post your final code according to schema.ini file …

Chihiro · Apr 24, 2017

Code does not change at all. It will automatically recognize Schema.ini that's in same folder as text files. And will take info from there.

If there's mismatch in Schema to actual file. It will throw error.

Edit: Technically, you can create ini via code. But that just requires writing ini values to file using Print & FreeFile.

Marc L · Apr 24, 2017

I created shema.ini from post #5 and saved it to same folder as csv files
but I got an automation error on line rst.Open strQuery, cn, 1, 3 …

Chihiro · Apr 24, 2017

Worked without issue on my end.

Hmm, not sure what would be the cause.

Marc L · Apr 24, 2017

Code works without Schema.ini with same result as your initial post …

No matter, I'm not the OP !

shrivallabha · Apr 25, 2017

Marc L said:
Code works without Schema.ini with same result as your initial post …

No matter, I'm not the OP !

Which makes me suspect that it could be regional settings at play (which I am guessing) as it works for me with schema.ini file.

Maybe little irrelevant to your issue but just for the sake of the original thread schema.ini is explained in detail here.
https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver

Marc L · Apr 25, 2017

Right ‼

Adding Format=CSVDelimited : no more error but no data …

Adding DecimalSymbol=. : bingo ‼
A bit weird as columns in text format but that works …

Thanks shrivallabha !

SKU : 1020, all other columns are blank on my side, you too ?
And some N/A in last two columns …

Edit : that seems normal as per csv data.

shrivallabha · Apr 25, 2017

Marc L said:
Right ‼

Adding Format=CSVDelimited : no more error but no data …

Adding DecimalSymbol=. : bingo ‼
A bit weird as columns in text format but that works …

Thanks shrivallabha !

SKU : 1020, all other columns are blank on my side, you too ?
And some N/A in last two columns …

Edit : that seems normal as per csv data.

Great!

Marc L · Apr 25, 2017

In fact works without Format=CSVDelimited,

only DecimalSymbol=. is needed (on my side) …

Code:

Sub UpdateSKU()
 Dim cn As Object, rst As Object, strQuery$
 Set cn = CreateObject("ADODB.Connection")
With cn
    .Provider = "Microsoft.ACE.OLEDB.12.0"
    .ConnectionString = "Data Source=" & ThisWorkbook.Path & _
                        "\;Extended Properties=""text;HDR=NO;FMT=Delimited;Imex=1;ImportMixedTypes=Text;"""
    .CursorLocation = 3
    .Open
End With
    Me.UsedRange.Clear
    strQuery = "SELECT t2.[F1], t1.[F2], t1.[F3], t1.[F4], t1.[F5] From [Test2.csv] as t1 " & _
               "RIGHT OUTER JOIN [Test1.csv] as t2 ON t2.[F1] = t1.[F1];"
    Set rst = CreateObject("ADODB.Recordset")
    On Error GoTo Fin
        rst.Open strQuery, cn, 1, 3
    [A1].CopyFromRecordset rst
        rst.Close
Fin:
     cn.Close
 Set cn = Nothing:  Set rst = Nothing
    If Err.Number Then Beep
End Sub

Workbook saved in csv folder, code in worksheet module …

Chihiro · Apr 25, 2017

Edit : that seems normal as per csv data.

Yep. Purpose of the code was to compare 2 files. If any SKU found in Test1.csv but not in Test2.csv add SKU with blank row (1020). Also to remove any SKU in Test2.csv that's not found in Test1.csv, while retaining info from Test2.csv for matching SKU. Hence, Right Outer Join is used.

Marc L · Apr 25, 2017

Yes I understood that after taking a glance to data …

shrivallabha · Apr 25, 2017

Chihiro said:
However, when I use same options for CSV files. ADO seem to guess the data type and will end up with data loss. I have also tried various values for "typeguessrows" for ACE registry key with no difference in result.

Just curious I looked into the registry, what I think should be changed is in below entry:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\ClickToRun\REGISTRY\MACHINE\Software\Wow6432Node\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Text

Key: ImportMixedTypes
Its Value is: Majority Type

Which is set to "Text" if you check the same entry for Excel. However, schema.ini route is the one that I will prefer any day over Registry Edits.

Chihiro · Apr 25, 2017

However, schema.ini route is the one that I will prefer any day over Registry Edits.

Same here. In addition to above. I found that you can set MaxScanRows=1 to work around it (especially when you set HDR=No, but actually have header).

ADO connection CSV issue with mixed data type

Chihiro

Excel Ninja

Attachments

SirJB7

Excel Rōnin

Chihiro

Excel Ninja

shrivallabha

Excel Ninja

Chihiro

Excel Ninja

shrivallabha

Excel Ninja

Marc L

Excel Ninja

Chihiro

Excel Ninja

Marc L

Excel Ninja

Chihiro

Excel Ninja

Marc L

Excel Ninja

shrivallabha

Excel Ninja

Marc L

Excel Ninja

shrivallabha

Excel Ninja

Marc L

Excel Ninja

Chihiro

Excel Ninja

Marc L

Excel Ninja

shrivallabha

Excel Ninja

Chihiro

Excel Ninja