• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Working code in IE but not in XLMHTTP

YasserKhalil

Well-Known Member
Hello everyone
Till now I couldn't get the difference between using IE and XMLHTTP (forgive my ignorance)
Although I have seen multiple threads I still couldn't get it

I have the following code that is working in IE
Code:
Sub Test_Scrape()
    Dim post        As Object
    Dim elem        As Object
    Dim r          As Long
   
    r = 2
   
    With CreateObject("InternetExplorer.Application")
        .Visible = False
        .navigate "https://whub.io/startups?q=&p=0&is_v=1"
        While .Busy = True Or .readyState < 4: DoEvents: Wend
        Application.Wait Now + TimeValue("00:00:05")
       
        Set post = .document.getElementById("search-startups-results")
       
        For Each elem In post.getElementsByClassName("ais-hits--item")
            Cells(r, 1).Value = elem.getElementsByTagName("span")(0).innerText
            Cells(r, 2).Value = elem.getElementsByTagName("a")(0).href
            r = r + 1
        Next elem
    End With
End Sub

But when trying to use XMLHTTP method, I encountered an error at this line
Code:
For Each elem In post.getElementsByClassName("ais-hits--item")

Any help in this topic please?
 
Hi !

As on my side this class does not exist within the webpage
whatever under IE or even Firefox !
At least check the number of its elements …
 

As this class does not exist on my side whatever under IE or Firefox,
I can't go further …​
 
That's too weird
I have tested firefox and inspector and the same result. It's already there
Untitled.png
If it is still not existing for you .. can you edit the code for me that is working for you?
 
On my side your IE code does nothing as the class does not exist !
And as you didn't ever explained where is precisely located
your class element on the webpage, I'm totally blind !

Maybe it's a country limitation, wait for someone else seeing your class …
 
Ok now under Firefox but not under IE (no pict with "new"),
maybe a limitation from my test computer old IE/Windows version
(common issue under IE) …

getElementsByClassName works only in early binding way,
not in late binding way (CreateObject), see shahin recent threads …
 
It is working for me either in early or late binding
Code:
Sub Test_Scrape()
    Dim ie As New SHDocVw.InternetExplorer
    Dim post        As Object
    Dim elem        As Object
    Dim r          As Long
   
    r = 2
   
    With ie
        .Visible = False
        .navigate "https://whub.io/startups?q=&p=0&is_v=1"
        While .Busy = True Or .readyState < 4: DoEvents: Wend
        Application.Wait Now + TimeValue("00:00:05")
       
        Set post = .document.getElementById("search-startups-results")
       
        For Each elem In post.getElementsByClassName("ais-hits--item")
            Cells(r, 1).Value = elem.getElementsByTagName("span")(0).innerText
            Cells(r, 2).Value = elem.getElementsByTagName("a")(0).href
            r = r + 1
        Next elem
    End With
End Sub
 

And without IE, with a request like WinHttp & a DOM element like htmlfile ?

Chihiro well explained the issue within a thread of shahin …​
 
What Marc is saying that, what's your xmlhttp code? Is all references to external library added to project (early bound) or not? If it isn't, getElementsbyxxx may fail to work. As VBA isn't able to discern what object collection it is (it could have more than 1 type of object in collection).

At any rate, looks like you are using inspector tool to find the element. When scraping using xmlhttp you need to look at "Source Code" and not inspect element. As some elements can be filled via script and may not be present in the source code (what's returned via request).

If that's the case, your best option is to pilot IE or use Selenium for other browsers.
 
I agree but
If it isn't, getElementsbyxxx may fail to work.
I met issues in late binding only with getElementsByClassName

So the workaround is to not use this statement with htmlfile late binding
when possible and working by Id, name, tag, …
 
Yes. I prefer using ID whenever possible. But too many sites don't use ID for all elements that I'm interested in :(

TagName is usually the safest bet if using object to loop, as you've mentioned.
 
Here's my try but failed to get the source
Code:
Sub Test()
    Dim myString    As String
    Dim URL        As String
    Dim sPath      As String
   
    URL = "https://whub.io/startups?q=&p=0&is_v=1"
    sPath = ThisWorkbook.Path & "\Test.html"
   
    myString = GetHTMLSource(URL)
    OutputText myString, sPath
End Sub

Function GetHTMLSource(strURL As String) As String
    Dim XMLHTTP    As Object
   
    Set XMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")

    With XMLHTTP
        .setTimeouts 2000, 2000, 2000, 2000
        .Open "GET", strURL, False
        .send
        If .Status = 200 Then GetHTMLSource = XMLHTTP.responseText Else GetHTMLSource = ""
    End With
End Function

Function OutputText(text As String, sPath As String)
    Dim f        As Integer

    f = FreeFile()
    Open sPath For Output As f

    Print #f, text
    Close #f
End Function
 
Can't reach this webpage today whatever the webbrowser !

As Chihiro yet wrote if data are not in original webpage code
easier is just to pilot IE …
Or maybe with a request if you already know
how to play with dynamic Ajax or Java script
but if it were the case you didn't use VBA for web scraping …
 

First check if you see data within ResponseText.
If not, check if you see data within htmlfile (html document)
once loaded with ResponseText.
If not, just pilot IE …
 

No template, just check object variable content in VBE Locals window
as scraping begins with reading, observing, …

To check text within a global text you can use InStr VBA function.
 
Back
Top