1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

ADO connection CSV issue with mixed data type

Discussion in 'VBA Macros' started by Chihiro, Apr 20, 2017 at 8:06 PM.

  1. Chihiro

    Chihiro Well-Known Member

    Messages:
    2,997
    I've been playing with ADO as of late. And encountered an issue.

    Normally when I use ADO to bring in data from closed workbook (.xlsx, .xlsm, .xlsb etc).
    I use "IMEX=1;ImportMixedTypes=Text;" and it can bring in fields with mixed data types without loss of data.

    However, when I use same options for CSV files. ADO seem to guess the data type and will end up with data loss. I have also tried various values for "typeguessrows" for ACE registry key with no difference in result.

    Is there different option that can be set to work around this issue?
    Currently working around it by converting csv to xlsx file before doing the query. But would like to know if there's another solution.

    Code sample: (sample csv files in zip attached)
    When you run it, you will see that third row is missing data (SKU = 1000C). As well as the first row (header).
    Code (vb):
    Sub UpdateSKU()
    Dim ws As Worksheet
    Dim fPath As String: fPath = "C:\Test\" 'Change as needed
    Dim cn As Object, rst As Object

    Set cn = CreateObject("ADODB.Connection")
    With cn
        .Provider = "Microsoft.ACE.OLEDB.12.0"
        .ConnectionString = "Data Source=" & fPath & ";" & _
                            "Extended Properties=""text;HDR=NO;FMT=Delimited;Imex=1;ImportMixedTypes=Text;"""
        .CursorLocation = 3
        .Open
    End With

    strQuery = "SELECT t2.[F1], t1.[F2], t1.[F3], t1.[F4], t1.[F5] From [Test2.csv] as t1 " & _
                "RIGHT OUTER JOIN [Test1.csv] as t2 " & _
                " ON t2.[F1] = t1.[F1];"

    Set rst = CreateObject("ADODB.Recordset")
    rst.Open strQuery, cn, 1, 3

    Range("A1").CopyFromRecordset rst
    rst.Close
    cn.Close

    End Sub

    Attached Files:

  2. SirJB7

    SirJB7 Excel Rōnin

    Messages:
    8,379
    Hi, Chihiro!

    If you look inside the .csv files with standard Windows Notepad you'll see a "normal" text file with fields separated by commas. But it you open them with Notepad++ you'll notice that they look a bit different.

    I guess that you have header columns with Alt-Enter to properly format titles, but this in fact generates a problem when saving as .csv.

    I'm copying as it's seen from both text editors:

    Standard Windows Notepad:
    -----
    SKU,Product #,"Weight (lbs.)02/09/2017","Unit of Measure02/09/2017","WHS Location02/09/2017"
    1000,20329,0.48,EA,II25A2
    1000C,20330,4.8,BX,II24A2
    1001,20149,0.48,EA,JJ20A2
    -----

    Notepad++:
    -----
    SKU,Product #,"Weight (lbs.)
    02/09/2017","Unit of Measure
    02/09/2017","WHS Location
    02/09/2017"
    1000,20329,0.48,EA,II25A2
    1000C,20330,4.8,BX,II24A2
    1001,20149,0.48,EA,JJ20A2
    -----

    Actually if you manually copy & paste from regular Notepad the output will reflect as that of Notepad++.

    Maybe the line breaks before the 3 dates in titles are the responsible of the wrong ADO behavior?

    Give a look at it replacing Alt-Enter by proper spacing.

    Regards!
    Chihiro likes this.
  3. Chihiro

    Chihiro Well-Known Member

    Messages:
    2,997
    Thanks for looking into this. I didn't notice the Line Change code before.

    Unfortunately, removing them made no change to the outcome.

    Will continue to look and update if I find any solution.

    Regards,
  4. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,591
    I don't think line feed chars are an issue as the structure remains consistent. If new line is required in csv files (posted ones) then it is always surrounded by pair of double quotes (") indicating the row input is not finished.

    Edit:
    Surely it is some setting in connection string as it seems to drop conflicting data type. I tried by modifying the sql statement to see what data it is pulling from test2.csv and it dropped where data type was in conflict.
    Code (vb):
    strQuery = "SELECT * From [Test2.csv]"
    Edit2:
    Your issue seems to be similar to what was being discussed in this thread:
    https://www.experts-exchange.com/qu...hen-Number-Data-is-Present-in-Other-Rows.html
    Need to create schema file to coerce data type.
    Last edited: Apr 22, 2017 at 12:46 PM
    Marc L, Chihiro and NARAYANK991 like this.
  5. Chihiro

    Chihiro Well-Known Member

    Messages:
    2,997
    Much obliged Shrivallabha. Ini did the trick.
    In hind sight this makes perfect sense as XML based Excel files naturally contain schema in their file structure (which can be brought in using Connection.OpenSchema). But text files do not.
    Edit: Binary file (.xlsb) also seems to contain Schema info within it.

    As sample following is what I used.

    Code (vb):
    [Test1.csv]
    ColNameHeader=False
    MaxScanRows=0
    Col1=F1 Char
    Col2=F2 Char
    Col3=F3 Char
    Col4=F4 Char
    Col5=F5 Char

    [Test2.csv]
    ColNameHeader=False
    MaxScanRows=0
    Col1=F1 Char
    Col2=F2 Char
    Col3=F3 Char
    Col4=F4 Char
    Col5=F5 Char
    Last edited: Apr 24, 2017 at 1:49 PM
  6. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,591
    True, there's no scope in flat files for handling data types. So if filenames and data structure is going to remain constant then you can use schema.ini setup else converting to Excel (as you are doing now) will be better option.
  7. Marc L

    Marc L Excel Ninja

    Messages:
    2,900

    Hi !

    Chihiro, thanks to post your final code according to schema.ini file …
  8. Chihiro

    Chihiro Well-Known Member

    Messages:
    2,997
    Code does not change at all. It will automatically recognize Schema.ini that's in same folder as text files. And will take info from there.

    If there's mismatch in Schema to actual file. It will throw error.

    Edit: Technically, you can create ini via code. But that just requires writing ini values to file using Print & FreeFile.
  9. Marc L

    Marc L Excel Ninja

    Messages:
    2,900

    I created shema.ini from post #5 and saved it to same folder as csv files
    but I got an automation error on line rst.Open strQuery, cn, 1, 3
  10. Chihiro

    Chihiro Well-Known Member

    Messages:
    2,997
    Worked without issue on my end.

    Hmm, not sure what would be the cause.
  11. Marc L

    Marc L Excel Ninja

    Messages:
    2,900

    Code works without Schema.ini with same result as your initial post …

    No matter, I'm not the OP ! :cool:
  12. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,591
    Which makes me suspect that it could be regional settings at play (which I am guessing) as it works for me with schema.ini file.

    Maybe little irrelevant to your issue but just for the sake of the original thread schema.ini is explained in detail here.
    https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver
    Marc L likes this.
  13. Marc L

    Marc L Excel Ninja

    Messages:
    2,900
    Right ‼

    Adding Format=CSVDelimited : no more error but no data …

    Adding DecimalSymbol=. : bingo ‼
    A bit weird as columns in text format but that works …

    Thanks shrivallabha !

    SKU : 1020, all other columns are blank on my side, you too ?
    And some N/A in last two columns …

    Edit : that seems normal as per csv data.
  14. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,591
    Great!
  15. Marc L

    Marc L Excel Ninja

    Messages:
    2,900
    In fact works without Format=CSVDelimited,

    only DecimalSymbol=. is needed (on my side) …​

    Code (vb):
    Sub UpdateSKU()
     Dim cn As Object, rst As Object, strQuery$
     Set cn = CreateObject("ADODB.Connection")
    With cn
        .Provider = "Microsoft.ACE.OLEDB.12.0"
        .ConnectionString = "Data Source=" & ThisWorkbook.Path & _
                            "\;Extended Properties=""text;HDR=NO;FMT=Delimited;Imex=1;ImportMixedTypes=Text;"""
        .CursorLocation = 3
        .Open
    End With
        Me.UsedRange.Clear
        strQuery = "SELECT t2.[F1], t1.[F2], t1.[F3], t1.[F4], t1.[F5] From [Test2.csv] as t1 " & _
                   "RIGHT OUTER JOIN [Test1.csv] as t2 ON t2.[F1] = t1.[F1];"
        Set rst = CreateObject("ADODB.Recordset")
        On Error GoTo Fin
            rst.Open strQuery, cn, 1, 3
        [A1].CopyFromRecordset rst
            rst.Close
    Fin:
         cn.Close
     Set cn = Nothing:  Set rst = Nothing
        If Err.Number Then Beep
    End Sub
    Workbook saved in csv folder, code in worksheet module …
  16. Chihiro

    Chihiro Well-Known Member

    Messages:
    2,997
    Yep. Purpose of the code was to compare 2 files. If any SKU found in Test1.csv but not in Test2.csv add SKU with blank row (1020). Also to remove any SKU in Test2.csv that's not found in Test1.csv, while retaining info from Test2.csv for matching SKU. Hence, Right Outer Join is used.
    Marc L likes this.
  17. Marc L

    Marc L Excel Ninja

    Messages:
    2,900


    Yes I understood that after taking a glance to data …​
  18. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,591
    Just curious I looked into the registry, what I think should be changed is in below entry:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\ClickToRun\REGISTRY\MACHINE\Software\Wow6432Node\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Text

    Key: ImportMixedTypes
    Its Value is: Majority Type

    Which is set to "Text" if you check the same entry for Excel. However, schema.ini route is the one that I will prefer any day over Registry Edits.
    Chihiro likes this.
  19. Chihiro

    Chihiro Well-Known Member

    Messages:
    2,997
    Same here. In addition to above. I found that you can set MaxScanRows=1 to work around it (especially when you set HDR=No, but actually have header).

Share This Page