1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Import text from huge text file into sheet columns

Discussion in 'VBA Macros' started by Samadhan Gaikwad, Oct 4, 2017.

  1. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    I have text file with around 2GB size, containing around 1,05,00,000 lines. I want to import all data into excel sheet and split into columns (since it wont fit in single column). I think assuming above data, it will into around 10 columns. (I need to process that data further to remove duplicate rows).

    So I just want to import that data into sheet columns using macro.
    Last edited: Oct 4, 2017
  2. Deepak

    Deepak Excel Ninja

    Messages:
    2,776
    Do you want to import all the rows to xl as same is not designed to store such large database.

    My recommendations-
    Either use Access or other software to filter the data in small parts.
  3. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    Yes, after some operations, like replace and remove duplicate, I hope lines will reduce to around 60 K. At the end only I will save spreadsheet.
  4. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,924
  5. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
  6. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    It did hang somewhere in between. Also tried to manually import data using Data tab -> From Text. It says file is larger, so imported data that fits in available rows. So next time when I start to import from (10 lakh rows + 1), it imports only single rows instead of all data that fit into that column.
  7. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,787
    Could you post a sample text file containing around 200 rows of data? Also identify the basis for working out duplicates. Make sure that the sample file contains few such cases.
    Samadhan Gaikwad likes this.
  8. Marc L

    Marc L Excel Ninja

    Messages:
    3,277

    As per forum rules :

    the more sample data with a crystal clear explanation, the better solution …​
    Samadhan Gaikwad likes this.
  9. Chihiro

    Chihiro Excel Ninja

    Messages:
    3,919
    Hmm, version of Excel?

    When dealing with large data (from DB or text). There are few things you can try.

    1. If you have Excel 2010 or later with PowerQuery/Get&Transform and x64 installation of Excel. Use PQ to import data and do all transformation in the editor and then only load data once you have the subset you want to keep. You can either keep the connection, or kill it to save file size (if you kill the connection, I'd suggest keeping template copy where data is kept as connection only. So you can easily replicate steps).

    2. Download free PowerBI Desktop, then do clean up in it and copy paste the result (too bad, they've removed export to Excel from Desktop, you need PowerBI Pro subscription for that now...).
    https://www.microsoft.com/en-us/download/details.aspx?id=45331

    3. Use Command Line, PowerShell etc to split text file into multiple files of smaller size. See link for some of standard batch file code, PowerShell script.
    https://stackoverflow.com/questions...smaller-multiple-text-file-using-command-line
    https://gallery.technet.microsoft.com/scriptcenter/PowerShell-Split-large-log-6f2c4da0

    You can use one of Open Source tool or commercial tool to split them as well.
    Google and you should find plenty of tools out there.
    Then import and clean each file separately and combine, then do final clean up at the end.

    4. Load to DB and do data clean up there. Then read from DB into Excel
  10. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    I am using Excel 2013. Tried to install power pivot...trim Add-ons but I could not see, that tab.
    Yeah, I got batch file which splits file into multiple having 10 lakh lines in each. Then am using replace all and remove duplicate for each column using VBA. Thank you, I will try rest options.
  11. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    I want to remove all text before a string test: including string. Also want to remove text after space.

    Some content from file:

    file1:test:src/draw/no/ind/erefe.c/gegegewg 1
    filesdsd:test:src/draw/no/ind/reryeryrey.c/jhrtjtrj 1
    fileskdk:sdsD:test:abc/disp/no/ind/ryretyrey.txt/Display::Restrict 20

    Desired output:

    src/draw/no/ind/erefe.c/gegegewg
    src/draw/no/ind/reryeryrey.c/jhrtjtrj
    abc/disp/no/ind/ryretyrey.txt/Display::Rest

    Text between above 2 delimiter can be duplicate in same file. For 10 lakh lines, 30k are unique and if there are 140 lakh lines, unique count is around 60 k only.
  12. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,787
    Test this code which will read data in memory and paste at last. Test it first with data where you know only 30K unique entries exist.
    Code (vb):
    Public Sub ReadTextFileData()
    Dim intFF As Integer
    Dim strInput As String
    Dim objDict As Object
    intFF = FreeFile
    '\\ Change text file path here
    Open "C:\TEMP\New Text Document.txt" For Input As #intFF
    Set objDict = CreateObject("Scripting.Dictionary")
    objDict.CompareMode = vbTextCompare
    Do While Not EOF(intFF)
        Line Input #intFF, strInput
        strInput = Trim(Split(Mid(strInput, InStrRev(strInput, "test:", , vbTextCompare) + 5, Len(strInput)), " ")(0))
        If Not objDict.exists(strInput) Then
            objDict.Add strInput, strInput
        End If
    Loop
    Close #intFF
    '\\ Last line will error if Keys count is greater than 64 K
    Range("A1").Resize(objDict.Count, 1) = Application.Transpose(objDict.Keys)
    End Sub
    Last edited: Oct 4, 2017
  13. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    Will you please elaborate about last comment which is about keys count?
    Also is there is any limit of lines in input file for this VBA?
  14. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,787
    Yes, the limit is not on input lines but on the "Application.Transpose" part. See below discussion:
    https://stackoverflow.com/questions/39736261/excel-vba-array-transpose-size-limit

    But if in principle if the code suits you then last part can be tweaked to handle large sized data. So verify that it works as you need and then we can look ahead.
    Samadhan Gaikwad likes this.
  15. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    Vba is working fine and that too, fast, saved my lots of time.
    But it's printing output like this:
    Code (vb):
    1:src/draw/no/ind/erefe.c/gegegewg
    1:src/draw/no/ind/reryeryrey.c/jhrtjtrj
    1:abc/disp/no/ind/ryretyrey.txt/Display::Rest
    I can remove it using replace all though. Thanks a lot again!
    And I hope I will have unique count less than 64 K.
  16. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,787
    Do you mean your initial string is not "test:" but "test:1:"?
    Samadhan Gaikwad likes this.
  17. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    No, my initial string is "test:". But it's printing additionally 1:
    I have used right and len formula to trim first 2 characters.
  18. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,787
    Sorry, I can't reproduce the result you are producing using data posted in post 11:
    For
    Code (vb):

    file1:test:src/draw/no/ind/erefe.c/gegegewg 1
    filesdsd:test:src/draw/no/ind/reryeryrey.c/jhrtjtrj 1
    fileskdk:sdsD:test:abc/disp/no/ind/ryretyrey.txt/Display::Restrict 20
     
    I get
    Code (vb):

    src/draw/no/ind/erefe.c/gegegewg
    src/draw/no/ind/reryeryrey.c/jhrtjtrj
    abc/disp/no/ind/ryretyrey.txt/Display::Restrict
     
    Have you edited the macro at your end?
  19. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    Yes, I have changed text file name and string from test: to MN1201:
  20. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,787
    Then you need to change this part:
    Code (vb):
    strInput = Trim(Split(Mid(strInput, InStrRev(strInput, "test:", , vbTextCompare) + 5, Len(strInput)), " ")(0))
    5 corresponds to length of test:
    so for "MN1201:" it will be 7 characters
    Code (vb):
    strInput = Trim(Split(Mid(strInput, InStrRev(strInput, "MN1201:", , vbTextCompare) + 7, Len(strInput)), " ")(0))
    Samadhan Gaikwad likes this.
  21. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    I see, ok, thanks again!
  22. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,787
    Here's revised without Transpose just in case if you encounter items more than 65536. I am not sure if it will be slower or faster but on any side there should not be big performance hit as still we will be doing most part in the computer memory.
    Code (vb):

    Public Sub ReadTextFileData()
    Dim intFF As Integer
    Dim strInput As String
    Dim objDict As Object
    Dim varRng As Variant, k As Variant
    Dim i As Long
     intFF = FreeFile
    '\\ Change text file path here
    Open "C:\TEMP\New Text Document.txt" For Input As #intFF
    Set objDict = CreateObject("Scripting.Dictionary")
     objDict.CompareMode = vbTextCompare
    Do While Not EOF(intFF)
      Line Input #intFF, strInput
      strInput = Trim(Split(Mid(strInput, InStrRev(strInput, "test:", , vbTextCompare) + 5, Len(strInput)), " ")(0))
      If Not objDict.exists(strInput) Then
      objDict.Add strInput, strInput
      End If
    Loop
    Close #intFF
    '\\ Prepare an array of the size dictionary keys
    varRng = Range("A1").Resize(objDict.Count, 1).Value
    i = 1
    For Each k In objDict.Keys
      varRng(i, 1) = k
      i = i + 1
    Next k
    Range("A1").Resize(objDict.Count, 1).Value = varRng
    Set objDict = Nothing
    End Sub
     
    Samadhan Gaikwad likes this.
  23. Samadhan Gaikwad

    Samadhan Gaikwad Member

    Messages:
    136
    Thank you so much again!
    Just one query, will I come to know if there is unique count >65 K. I guess, will get error when it exceeds limit.
    And in that (when i realize that issue), will use your second macro.
  24. shrivallabha

    shrivallabha Excel Ninja

    Messages:
    1,787
    Perfect! Thanks for the feedback!!
    Samadhan Gaikwad likes this.

Share This Page