1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Working code in IE but not in XLMHTTP

Discussion in 'VBA Macros' started by YasserKhalil, Mar 11, 2018.

  1. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    Hello everyone
    Till now I couldn't get the difference between using IE and XMLHTTP (forgive my ignorance)
    Although I have seen multiple threads I still couldn't get it

    I have the following code that is working in IE
    Code (vb):
    Sub Test_Scrape()
        Dim post        As Object
        Dim elem        As Object
        Dim r          As Long
       
        r = 2
       
        With CreateObject("InternetExplorer.Application")
            .Visible = False
            .navigate "https://whub.io/startups?q=&p=0&is_v=1"
            While .Busy = True Or .readyState < 4: DoEvents: Wend
            Application.Wait Now + TimeValue("00:00:05")
           
            Set post = .document.getElementById("search-startups-results")
           
            For Each elem In post.getElementsByClassName("ais-hits--item")
                Cells(r, 1).Value = elem.getElementsByTagName("span")(0).innerText
                Cells(r, 2).Value = elem.getElementsByTagName("a")(0).href
                r = r + 1
            Next elem
        End With
    End Sub
    But when trying to use XMLHTTP method, I encountered an error at this line
    Code (vb):
    For Each elem In post.getElementsByClassName("ais-hits--item")
    Any help in this topic please?
  2. Marc L

    Marc L Excel Ninja

    Messages:
    4,218
    Hi !

    As on my side this class does not exist within the webpage
    whatever under IE or even Firefox !
    At least check the number of its elements …
    YasserKhalil likes this.
  3. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    That's weird .. but I got results when using IE
  4. Marc L

    Marc L Excel Ninja

    Messages:
    4,218

    As this class does not exist on my side whatever under IE or Firefox,
    I can't go further …​
  5. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    It is already there when inspecting IE
    Untitled.png
  6. Marc L

    Marc L Excel Ninja

    Messages:
    4,218


    Still nothing whatever IE or Firefox on my side …​
  7. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    That's too weird
    I have tested firefox and inspector and the same result. It's already there
    Untitled.png
    If it is still not existing for you .. can you edit the code for me that is working for you?
  8. Marc L

    Marc L Excel Ninja

    Messages:
    4,218
    On my side your IE code does nothing as the class does not exist !
    And as you didn't ever explained where is precisely located
    your class element on the webpage, I'm totally blind !

    Maybe it's a country limitation, wait for someone else seeing your class …
  9. Marc L

    Marc L Excel Ninja

    Messages:
    4,218
    Ok now under Firefox but not under IE (no pict with "new"),
    maybe a limitation from my test computer old IE/Windows version
    (common issue under IE) …

    getElementsByClassName works only in early binding way,
    not in late binding way (CreateObject), see shahin recent threads …
  10. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    Thanks a lot ..
    Here's snapshot of an element ..
    Untitled.png
  11. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    It is working for me either in early or late binding
    Code (vb):
    Sub Test_Scrape()
        Dim ie As New SHDocVw.InternetExplorer
        Dim post        As Object
        Dim elem        As Object
        Dim r          As Long
       
        r = 2
       
        With ie
            .Visible = False
            .navigate "https://whub.io/startups?q=&p=0&is_v=1"
            While .Busy = True Or .readyState < 4: DoEvents: Wend
            Application.Wait Now + TimeValue("00:00:05")
           
            Set post = .document.getElementById("search-startups-results")
           
            For Each elem In post.getElementsByClassName("ais-hits--item")
                Cells(r, 1).Value = elem.getElementsByTagName("span")(0).innerText
                Cells(r, 2).Value = elem.getElementsByTagName("a")(0).href
                r = r + 1
            Next elem
        End With
    End Sub
  12. Marc L

    Marc L Excel Ninja

    Messages:
    4,218

    And without IE, with a request like WinHttp & a DOM element like htmlfile ?

    Chihiro well explained the issue within a thread of shahin …​
  13. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    I don't get that .. I am sorry
  14. Chihiro

    Chihiro Excel Ninja

    Messages:
    4,824
    What Marc is saying that, what's your xmlhttp code? Is all references to external library added to project (early bound) or not? If it isn't, getElementsbyxxx may fail to work. As VBA isn't able to discern what object collection it is (it could have more than 1 type of object in collection).

    At any rate, looks like you are using inspector tool to find the element. When scraping using xmlhttp you need to look at "Source Code" and not inspect element. As some elements can be filled via script and may not be present in the source code (what's returned via request).

    If that's the case, your best option is to pilot IE or use Selenium for other browsers.
  15. Marc L

    Marc L Excel Ninja

    Messages:
    4,218
    I agree but
    I met issues in late binding only with getElementsByClassName

    So the workaround is to not use this statement with htmlfile late binding
    when possible and working by Id, name, tag, …
  16. Chihiro

    Chihiro Excel Ninja

    Messages:
    4,824
    Yes. I prefer using ID whenever possible. But too many sites don't use ID for all elements that I'm interested in :(

    TagName is usually the safest bet if using object to loop, as you've mentioned.
    YasserKhalil likes this.
  17. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    Here's my try but failed to get the source
    Code (vb):
    Sub Test()
        Dim myString    As String
        Dim URL        As String
        Dim sPath      As String
       
        URL = "https://whub.io/startups?q=&p=0&is_v=1"
        sPath = ThisWorkbook.Path & "\Test.html"
       
        myString = GetHTMLSource(URL)
        OutputText myString, sPath
    End Sub

    Function GetHTMLSource(strURL As String) As String
        Dim XMLHTTP    As Object
       
        Set XMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")

        With XMLHTTP
            .setTimeouts 2000, 2000, 2000, 2000
            .Open "GET", strURL, False
            .send
            If .Status = 200 Then GetHTMLSource = XMLHTTP.responseText Else GetHTMLSource = ""
        End With
    End Function

    Function OutputText(text As String, sPath As String)
        Dim f        As Integer

        f = FreeFile()
        Open sPath For Output As f

        Print #f, text
        Close #f
    End Function
  18. Chihiro

    Chihiro Excel Ninja

    Messages:
    4,824
    Hmm, I can't even open the URL.

    FYI - you should use # in front of "f" in Open sPath for... line.
  19. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    I am totally stuck
    Can you help me with a solution regardless my approach ...?
  20. Marc L

    Marc L Excel Ninja

    Messages:
    4,218
    Can't reach this webpage today whatever the webbrowser !

    As Chihiro yet wrote if data are not in original webpage code
    easier is just to pilot IE …
    Or maybe with a request if you already know
    how to play with dynamic Ajax or Java script
    but if it were the case you didn't use VBA for web scraping …
  21. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989

    Attached Files:

  22. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    Any help in this topic please?
  23. Marc L

    Marc L Excel Ninja

    Messages:
    4,218

    First check if you see data within ResponseText.
    If not, check if you see data within htmlfile (html document)
    once loaded with ResponseText.
    If not, just pilot IE …
    YasserKhalil likes this.
  24. YasserKhalil

    YasserKhalil Active Member

    Messages:
    989
    Can you give me a template code for doing that please?
  25. Marc L

    Marc L Excel Ninja

    Messages:
    4,218

    No template, just check object variable content in VBE Locals window
    as scraping begins with reading, observing, …

    To check text within a global text you can use InStr VBA function.

Share This Page