1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Check next class after "linkcell" class when scraping data

Discussion in 'VBA Macros' started by YasserKhalil, Jul 22, 2017.

  1. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    Hello everyone
    I am trying to learn about scraping data and I did half of my target .. There is a list of functions and its description
    Depending on class name "linkcell" I could grab functions list properly
    but as for description I am confused because the class name related sometimes is "definecell" and the last one in each table "functab" is "definecelllast"
    Here's the code
    Code (vb):
    Sub Test()
        Dim element        As IHTMLElement
        Dim elements        As IHTMLElementCollection
        Dim ie              As InternetExplorer
        Dim html            As HTMLDocument
        Dim counter        As Long
        Dim erow            As Long

        Set ie = New InternetExplorer
        ie.Visible = True

        ie.navigate "http://www.excelfunctions.net/vba-functions.html"

        Do While ie.readyState <> READYSTATE_COMPLETE
            Application.StatusBar = "Loading Web page …"
            DoEvents
        Loop

      Set html = ie.document
        Set elements = html.getElementsByClassName("linkcell")
        counter = 0

        For Each element In elements
                erow = Sheet1.Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
                Cells(erow, 1) = html.getElementsByClassName("linkcell")(counter).innerText
               
                'Need help at this point
               'Cells(erow, 2) = html.getElementsByTagName("href")(counter).innerText
             
                counter = counter + 1
        Next element

        Application.StatusBar = ""
        ie.Quit
       
        MsgBox "Done...", 64
    End Sub
    Thanks advanced for help
  2. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    Hi !

    It may be better to loop Table objects in order to read their elements …

    Working with class names could be a mess as it can fail on some computers.
  3. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    Thanks a lot Mr. Marc
    I missed your replies since a while
    In fact I am novice at scarping .. Can you give me a model code of how to loop through the table objects>> ?
    Another point I think there are many tables so I would first loop through each table ... How can I do that?
  4. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    As you can see in many threads of this forum there is no model code
    'cause each scrapping is specific to the source webpage …

    Loop on getElementsByTagName("TABLE") and foresee a table object structure (within VBE Locals window) as all is here :
    observe & respect object model …
    You will see in the table model some Rows objects
    and within Rows some Cells objects.
  5. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,652
    Hi ,

    Try this :
    Code (vb):

    Sub Test()
        Dim element        As IHTMLElement
        Dim elements        As IHTMLElementCollection
        Dim ie              As InternetExplorer
        Dim html            As HTMLDocument
        Dim counter        As Long
        Dim erow            As Long

        Set ie = New InternetExplorer
        ie.Visible = True

        ie.navigate "http://www.excelfunctions.net/vba-functions.html"

        Do While ie.readyState <> READYSTATE_COMPLETE
            Application.StatusBar = "Loading Web page …"
            DoEvents
        Loop
        Application.StatusBar = ""

        Set html = ie.document
        Set elements = html.getElementsByClassName("linkcell")
        counter = 0

        For Each element In elements
            erow = Sheet1.Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
            With element
                Cells(erow, 1) = .innerText
                Cells(erow, 2) = .NextSibling.innerText
            End With
            counter = counter + 1
        Next element

        ie.Quit
        Sheet1.Columns(1).Resize(, 2).AutoFit
        MsgBox "Done...", 64
    End Sub
     
    Narayan
    Marc L, shahin and YasserKhalil like this.
  6. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    You're awesome and wonderful ..
    The clue was in that magic part (NextSibling.innerText) ..
    Thanks a lot for great help

    Can you help me with looping through exisiting tables as I need to learn that too?

    Best Regards
  7. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    First demonstration :​
    Code (vb):
    Sub DemoIE1()
        Dim oTable As HTMLTable, oRow As HTMLTableRow, C%, R&
        Application.StatusBar = "      Downloading …"
        Sheet1.UsedRange.Clear
    With New InternetExplorer
            .Navigate "http://www.excelfunctions.net/vba-functions.html"
    '       .Visible = True
      While .Busy Or .ReadyState < 4:  DoEvents:  Wend
        For Each oTable In .Document.getElementsByClassName("functab")
            For Each oRow In oTable.Rows
                        R = R + 1
                With oRow.Cells
                    For C = 1 To .Length
                        Sheet1.Cells(R, C).Value = .Item(C - 1).innerText
                    Next
                    If .Length = 1 Then
                        Sheet1.Cells(R, 1).Font.Bold = True
                    Else
                        Sheet1.Cells(R, 1).IndentLevel = 3
                        Sheet1.Hyperlinks.Add Sheet1.Cells(R, 1), .Item(0).Children(0).href
                    End If
                End With
            Next
        Next
            .Quit
    End With
        Sheet1.UsedRange.Columns.AutoFit
        Application.StatusBar = ""
        MsgBox "Done …", vbInformation, "    Demo IE #1"
    End Sub
    shahin and YasserKhalil like this.
  8. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    An easy IE way no needing to add any reference (late binding) :​
    Code (vb):
    Sub DemoIE2()
            Dim oDoc As Object, oTable As Object, R&
            Set oDoc = CreateObject("htmlfile")
            Application.StatusBar = "      Downloading …"
            Sheet1.UsedRange.Clear
            R = 1
    With CreateObject("InternetExplorer.Application")
            .Navigate "http://www.excelfunctions.net/vba-functions.html"
    '       .Visible = True
      While .Busy Or .ReadyState < 4:  DoEvents:  Wend
        For Each oTable In .Document.getElementsByClassName("functab")
            If oDoc.frames.clipboardData.setData("Text", oTable.outerHTML) Then
                Sheet1.Paste Sheet1.Cells(R, 1)
                R = R + oTable.Rows.Length
            End If
        Next
            .Quit
    End With
                oDoc.frames.clipboardData.clearData "Text"
            Set oDoc = Nothing
            Sheet1.UsedRange.Columns.AutoFit
            Application.StatusBar = ""
            MsgBox "Done …", vbInformation, "    Demo IE #2"
    End Sub
    shahin and YasserKhalil like this.
  9. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    That's great and awesome Mr MarcL
    Thank you very very much for these magic solutions ..
    I will study these codes further to learn from them so please let me ask for anything that is unclear for me
    Best Regards
  10. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    I tried to get this part but couldn't get it ..
    Code (vb):
    .Item(0).Children(0).href
    And as for Demo2
    Code (vb):
    If oDoc.frames.clipboardData.setData("Text", oTable.outerHTML) Then
  11. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    Item(0) : first item
    Children(0) : first child
    href : link property …
    … as you can see in VBE Locals window …

    Copy table data to the clipboard, see IE documentation on MSDN …
    YasserKhalil and shahin like this.
  12. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    Thanks a lot for great and awesome help that I received in that topic
    Thank you very much
  13. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    Last but not least, the faster request way (no reference to add) :​
    Code (vb):
    Sub DemoReq()
        Dim oTable As Object, R&, T$
        Application.StatusBar = "      Downloading …"
        Sheet1.UsedRange.Clear
    With CreateObject("WinHttp.WinHttpRequest.5.1")
        .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
        .setRequestHeader "DNT", "1"
         On Error Resume Next
        .send
         If Err.Number Then Beep: Exit Sub
         On Error GoTo 0
     If .Status = 200 Then T = .responseText Else Beep: Exit Sub
    End With
         R = 1
    With CreateObject("htmlfile")
            .body.innerHTML = T
        For Each oTable In .getElementsByTagName("TABLE")
    '        If oTable.className = "functab" Then
                If .frames.clipboardData.setData("Text", oTable.outerHTML) Then
                    Sheet1.Paste Sheet1.Cells(R, 1)
                    R = R + oTable.Rows.Length
                End If
    '        End If
       Next
            .frames.clipboardData.clearData "Text"
    End With
        Sheet1.UsedRange.Columns.AutoFit
        Application.StatusBar = ""
        MsgBox "Done …", vbInformation, " Demo Request"
    End Sub
    YasserKhalil likes this.
  14. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    You're a legend Mr. Marc
    That's really awesome and fascinating demos

    What is this line
    Code (vb):
    .setRequestHeader "DNT", "1"
    What "DNT" is it a header? Where did you get it? (By inspecting elements ..?!) and what 1 ?
  15. shahin

    shahin Active Member

    Messages:
    479
    Another way to get the required data from that specific webpage with adding nothing to the reference library:
    Code (vb):

    Sub Tab_Data()
        Dim hdoc as Object, tbl As Object, tRow As Object, tCel As Object
     
        With CreateObject("MSxml2.xmlhttp")
            .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
            .send
            Set hdoc = CreateObject("htmlfile")
            hdoc.body.innerHTML = .responseText
        End With
     
        For Each tbl In hdoc.getElementsByTagName("table")
            For Each tRow In tbl.getElementsByTagName("tr")
                For Each tCel In tRow.getElementsByTagName("td")
                    y = y + 1
                    Cells(x, y) = tCel.innerText
                Next tCel
                y = 0
                x = x + 1
            Next tRow
        Next tbl
    End Sub
     
    Last edited: Jul 24, 2017
    YasserKhalil likes this.
  16. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    Thank you very much Shahin for great contributions
    It seems that this thread will be a great reference ...
  17. shahin

    shahin Active Member

    Messages:
    479
    Finally, if you wanna stick to the class then this one works fine as well:
    Code (vb):

    Sub Vba_tabledata()

        Dim http As New XMLHTTP60, html As New HTMLDocument
        Dim htable As Object, data As Object

        With http
            .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
            .send
            html.body.innerHTML = .responseText
        End With

        Set htable = html.getElementsByClassName("linkcell")
        For Each data In htable
            x = x + 1
            Cells(x, 1) = data.innerText
            Cells(x, 2) = data.NextSibling.innerText
        Next data

    End Sub
     
    Btw, add the following reference to the library:
    Microsoft XML and Microsoft HTML object Library
    Marc L and YasserKhalil like this.
  18. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    Thanks a lot for great contributions
    Best Regards for all of you
  19. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    DNT means Do Not Track and "1" is like switching the light on …
    I got it by inspecting requests under Mozilla Firefox …
    This header is not mandatory, at the convenience of the coder.
    From RFC blank book

    As you can see, there is no model to scrap data,
    as it depends on the webpage object model
    and the coder feeling how to reach them …
    shahin, YasserKhalil and SirJB7 like this.
  20. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    Thanks to Shahin and its last code, getElementsByClassName
    fails on my side in late binding on "functab" but in early binding
    - with reference Microsoft HTML Object Library added -
    no more issue pointing this functab class :​
    Code (vb):
    Sub DemoReqRevA()
        Dim oTable As HTMLTable, R&, T$
        Application.StatusBar = "      Downloading …"
        Sheet1.UsedRange.Clear
    With CreateObject("WinHttp.WinHttpRequest.5.1")
        .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
        .setRequestHeader "DNT", "1"
         On Error Resume Next
        .send
         If Err.Number Then Beep: Exit Sub
         On Error GoTo 0
     If .Status = 200 Then T = .responseText Else Beep: Exit Sub
    End With
         R = 1
    With New HTMLDocument
            .body.innerHTML = T
        For Each oTable In .getElementsByClassName("functab")
            If .frames.clipboardData.setData("Text", oTable.outerHTML) Then
                Sheet1.Paste Sheet1.Cells(R, 1)
                R = R + oTable.Rows.Length
            End If
        Next
            .frames.clipboardData.clearData "Text"
    End With
        Sheet1.UsedRange.Columns.AutoFit
        Application.StatusBar = ""
        MsgBox "Done …", vbInformation, " Demo Request"
    End Sub

    Shahin - again - do not forget to free object variables …
    YasserKhalil and shahin like this.
  21. shahin

    shahin Active Member

    Messages:
    479
    @Marc L,
    Whatever thread it is, I find it very thrilling to get in contact with you. Something I would like to ask. The other day you taught me how to write code without using "On Error Resume Next" so that in every situation the code will remain invulnerable. However, you might have noticed in my second post here that my written code is very much fragile and will break if it fails to find class name anywhere in the table. I could have handled it if two class elements are there by using "with statement" in the later expression in combination with "length" and "item" property. I can't do it here because of single class element. Could you please take a look how can it be if i want to use "with" statement along with "length" and "item" property in my below code. I tried but this is not the way you know:
    Code (vb):

    Sub Tabledata()
        Dim http As New XMLHTTP60, html As New HTMLDocument
        Dim htable As Object, data As Object

        With http
            .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
            .send
            html.body.innerHTML = .responseText
        End With

        For Each data In html.getElementsByClassName("linkcell")
            With data
                If .Length Then x = x + 1: Cells(x, 1) = .item(0).innerText
                If .Length Then Cells(x, 2) = .item(0).NextSibling.innerText
            End With
        Next data
    End Sub
     
  22. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    No matter using a For Each loop :
    without any element, the loop directly ends …
    [Just try next code with "linkcellS" for example.]

    So in this case it's just a misobservation of the object model !
    As data variable is not a collection but just an element
    so without a length property …

    For Each element In collection : length property on collection,
    not under element as you can foresee in VBE Locals window …

    Your code amended :​
    Code (vb):
    Sub Tabledata()
        Dim oDoc As New HTMLDocument, oCell As HTMLTableCell, R&
        ActiveSheet.UsedRange.Clear
    With New XMLHTTP60
        .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
        .send
         oDoc.body.innerHTML = .responseText
    End With
    For Each oCell In oDoc.getElementsByClassName("linkcell")
        R = R + 1
        Cells(R, 1).Resize(, 2).Value = Array(oCell.innerText, oCell.nextSibling.innerText)
    Next
        Set oDoc = Nothing
    End Sub
    shahin and YasserKhalil like this.
  23. YasserKhalil

    YasserKhalil Active Member

    Messages:
    740
    This thread will be definitely great reference for scraping web data .. Thanks a lot for both of you
    Best Regards
  24. Marc L

    Marc L Excel Ninja

    Messages:
    3,175
    You're welcome Yasser !

    Same result but pointing and checking the collection :​
    Code (vb):
    Sub TableData2()
        Dim oDoc As New HTMLDocument, N&, R&
        ActiveSheet.UsedRange.Clear
    With New XMLHTTP60
        .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
        .send
         oDoc.body.innerHTML = .responseText
    End With
    With oDoc.getElementsByClassName("linkcell")
    '   If .Length Then
            For N = 0 To .Length - 1
                R = R + 1
                Cells(R, 1).Resize(, 2).Value = Array(.Item(N).innerText, .Item(N).nextSibling.innerText)
            Next
    '   End If
    End With
        Set oDoc = Nothing
    End Sub
    Try with an non existing class like"linkcellS", no issue …

    Obviously you must notice the convenience of For Each.
    YasserKhalil and shahin like this.
  25. shahin

    shahin Active Member

    Messages:
    479
    Hats off to you, Marc L. You are just awesome!!! Sorry, YasserKhalil for creating a sub-thread within your thread. Basically, it is hard for me to control myself from erecting any question specially when I get in contact with Marc L.

Share This Page