• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

ADO connection CSV issue with mixed data type

Chihiro

Excel Ninja
I've been playing with ADO as of late. And encountered an issue.

Normally when I use ADO to bring in data from closed workbook (.xlsx, .xlsm, .xlsb etc).
I use "IMEX=1;ImportMixedTypes=Text;" and it can bring in fields with mixed data types without loss of data.

However, when I use same options for CSV files. ADO seem to guess the data type and will end up with data loss. I have also tried various values for "typeguessrows" for ACE registry key with no difference in result.

Is there different option that can be set to work around this issue?
Currently working around it by converting csv to xlsx file before doing the query. But would like to know if there's another solution.

Code sample: (sample csv files in zip attached)
When you run it, you will see that third row is missing data (SKU = 1000C). As well as the first row (header).
Code:
Sub UpdateSKU()
Dim ws As Worksheet
Dim fPath As String: fPath = "C:\Test\" 'Change as needed
Dim cn As Object, rst As Object

Set cn = CreateObject("ADODB.Connection")
With cn
    .Provider = "Microsoft.ACE.OLEDB.12.0"
    .ConnectionString = "Data Source=" & fPath & ";" & _
                        "Extended Properties=""text;HDR=NO;FMT=Delimited;Imex=1;ImportMixedTypes=Text;"""
    .CursorLocation = 3
    .Open
End With

strQuery = "SELECT t2.[F1], t1.[F2], t1.[F3], t1.[F4], t1.[F5] From [Test2.csv] as t1 " & _
            "RIGHT OUTER JOIN [Test1.csv] as t2 " & _
            " ON t2.[F1] = t1.[F1];"

Set rst = CreateObject("ADODB.Recordset")
rst.Open strQuery, cn, 1, 3

Range("A1").CopyFromRecordset rst
rst.Close
cn.Close

End Sub
 

Attachments

  • Test.zip
    1.4 KB · Views: 17
Hi, Chihiro!

If you look inside the .csv files with standard Windows Notepad you'll see a "normal" text file with fields separated by commas. But it you open them with Notepad++ you'll notice that they look a bit different.

I guess that you have header columns with Alt-Enter to properly format titles, but this in fact generates a problem when saving as .csv.

I'm copying as it's seen from both text editors:

Standard Windows Notepad:
-----
SKU,Product #,"Weight (lbs.)02/09/2017","Unit of Measure02/09/2017","WHS Location02/09/2017"
1000,20329,0.48,EA,II25A2
1000C,20330,4.8,BX,II24A2
1001,20149,0.48,EA,JJ20A2
-----

Notepad++:
-----
SKU,Product #,"Weight (lbs.)
02/09/2017","Unit of Measure
02/09/2017","WHS Location
02/09/2017"
1000,20329,0.48,EA,II25A2
1000C,20330,4.8,BX,II24A2
1001,20149,0.48,EA,JJ20A2
-----

Actually if you manually copy & paste from regular Notepad the output will reflect as that of Notepad++.

Maybe the line breaks before the 3 dates in titles are the responsible of the wrong ADO behavior?

Give a look at it replacing Alt-Enter by proper spacing.

Regards!
 
Thanks for looking into this. I didn't notice the Line Change code before.

Unfortunately, removing them made no change to the outcome.

Will continue to look and update if I find any solution.

Regards,
 
I don't think line feed chars are an issue as the structure remains consistent. If new line is required in csv files (posted ones) then it is always surrounded by pair of double quotes (") indicating the row input is not finished.

Edit:
Surely it is some setting in connection string as it seems to drop conflicting data type. I tried by modifying the sql statement to see what data it is pulling from test2.csv and it dropped where data type was in conflict.
Code:
strQuery = "SELECT * From [Test2.csv]"

Edit2:
Your issue seems to be similar to what was being discussed in this thread:
https://www.experts-exchange.com/qu...hen-Number-Data-is-Present-in-Other-Rows.html
Need to create schema file to coerce data type.
 
Last edited:
Much obliged Shrivallabha. Ini did the trick.
In hind sight this makes perfect sense as XML based Excel files naturally contain schema in their file structure (which can be brought in using Connection.OpenSchema). But text files do not.
Edit: Binary file (.xlsb) also seems to contain Schema info within it.

As sample following is what I used.

Code:
[Test1.csv]
ColNameHeader=False
MaxScanRows=0
Col1=F1 Char
Col2=F2 Char
Col3=F3 Char
Col4=F4 Char
Col5=F5 Char

[Test2.csv]
ColNameHeader=False
MaxScanRows=0
Col1=F1 Char
Col2=F2 Char
Col3=F3 Char
Col4=F4 Char
Col5=F5 Char
 
Last edited:
Much obliged Shrivallabha. Ini did the trick.
In hind sight this makes perfect sense as XML based Excel files naturally contain schema in their file structure (which can be brought in using Connection.OpenSchema). But text files do not.
Edit: Binary file (.xlsb) also seems to contain Schema info within it.

As sample following is what I used.

Code:
[Test1.csv]
ColNameHeader=False
MaxScanRows=0
Col1=F1 Char
Col2=F2 Char
Col3=F3 Char
Col4=F4 Char
Col5=F5 Char

[Test2.csv]
ColNameHeader=False
MaxScanRows=0
Col1=F1 Char
Col2=F2 Char
Col3=F3 Char
Col4=F4 Char
Col5=F5 Char
True, there's no scope in flat files for handling data types. So if filenames and data structure is going to remain constant then you can use schema.ini setup else converting to Excel (as you are doing now) will be better option.
 
Code does not change at all. It will automatically recognize Schema.ini that's in same folder as text files. And will take info from there.

If there's mismatch in Schema to actual file. It will throw error.

Edit: Technically, you can create ini via code. But that just requires writing ini values to file using Print & FreeFile.
 

I created shema.ini from post #5 and saved it to same folder as csv files
but I got an automation error on line rst.Open strQuery, cn, 1, 3
 

Code works without Schema.ini with same result as your initial post …

No matter, I'm not the OP ! :cool:
 
Right ‼

Adding Format=CSVDelimited : no more error but no data …

Adding DecimalSymbol=. : bingo ‼
A bit weird as columns in text format but that works …

Thanks shrivallabha !

SKU : 1020, all other columns are blank on my side, you too ?
And some N/A in last two columns …

Edit : that seems normal as per csv data.
 
Right ‼

Adding Format=CSVDelimited : no more error but no data …

Adding DecimalSymbol=. : bingo ‼
A bit weird as columns in text format but that works …

Thanks shrivallabha !

SKU : 1020, all other columns are blank on my side, you too ?
And some N/A in last two columns …

Edit : that seems normal as per csv data.
Great!
 
In fact works without Format=CSVDelimited,

only DecimalSymbol=. is needed (on my side) …​

Code:
Sub UpdateSKU()
 Dim cn As Object, rst As Object, strQuery$
 Set cn = CreateObject("ADODB.Connection")
With cn
    .Provider = "Microsoft.ACE.OLEDB.12.0"
    .ConnectionString = "Data Source=" & ThisWorkbook.Path & _
                        "\;Extended Properties=""text;HDR=NO;FMT=Delimited;Imex=1;ImportMixedTypes=Text;"""
    .CursorLocation = 3
    .Open
End With
    Me.UsedRange.Clear
    strQuery = "SELECT t2.[F1], t1.[F2], t1.[F3], t1.[F4], t1.[F5] From [Test2.csv] as t1 " & _
               "RIGHT OUTER JOIN [Test1.csv] as t2 ON t2.[F1] = t1.[F1];"
    Set rst = CreateObject("ADODB.Recordset")
    On Error GoTo Fin
        rst.Open strQuery, cn, 1, 3
    [A1].CopyFromRecordset rst
        rst.Close
Fin:
     cn.Close
 Set cn = Nothing:  Set rst = Nothing
    If Err.Number Then Beep
End Sub
Workbook saved in csv folder, code in worksheet module …
 
Edit : that seems normal as per csv data.

Yep. Purpose of the code was to compare 2 files. If any SKU found in Test1.csv but not in Test2.csv add SKU with blank row (1020). Also to remove any SKU in Test2.csv that's not found in Test1.csv, while retaining info from Test2.csv for matching SKU. Hence, Right Outer Join is used.
 
However, when I use same options for CSV files. ADO seem to guess the data type and will end up with data loss. I have also tried various values for "typeguessrows" for ACE registry key with no difference in result.
Just curious I looked into the registry, what I think should be changed is in below entry:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\ClickToRun\REGISTRY\MACHINE\Software\Wow6432Node\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Text

Key: ImportMixedTypes
Its Value is: Majority Type

Which is set to "Text" if you check the same entry for Excel. However, schema.ini route is the one that I will prefer any day over Registry Edits.
 
However, schema.ini route is the one that I will prefer any day over Registry Edits.

Same here. In addition to above. I found that you can set MaxScanRows=1 to work around it (especially when you set HDR=No, but actually have header).
 
Back
Top