• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to avoid hardcoded loop to reach the bottom of a webpage?

shahin

Active Member
I've written some code in vba to pull out doctor names, their phone numbers and email addresses from a webpage with load more button. The way I've written my script is just doing awesome with no issues. After a long trial I could find out that lazy-load can be controlled in a very easy way. However, the only thing I would like to improve upon my existing script is use any method to avoid hardcoded loop so that it will keep going down while clicking on the load more button until there is no such button left to be clicked. At this moment, If I make my loop "For scroll_down = 1 To 20" to "For scroll_down = 1 To 50" then it will no doubt reaches the bottom but i would like to avoid this hardcoded loop. Any help will be highly appreciated.

This is the script:
Code:
Sub Web_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim storage As Object, posts As Object

    With IE
        .Visible = False
        .navigate "http://www.physiofirst.org.uk/find-physio/search-physio.html"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .document
    End With

    For scroll_down = 1 To 10
        Set storage = html.getElementsByClassName("articles-item")
        html.parentWindow.scrollBy 0, 99999
        html.getElementById("load-more-practice").Click
        Application.Wait Now() + TimeValue("00:00:005")
    Next scroll_down

    For Each posts In storage
        With posts.getElementsByClassName("heading")
            If .Length Then Row = Row + 1: Cells(Row, 1) = .Item(0).innerText
        End With
        With posts.getElementsByClassName("no-list")(0).getElementsByTagName("li")
            If InStr(1, posts.innerText, "Tel:", 1) > 0 Then Cells(Row, 2) = Split(.Item(0).innerText, "Tel:")(1)
        End With
        With posts.getElementsByClassName("no-list")(1).getElementsByTagName("a")
            If InStr(1, posts.innerText, "Email us", 1) > 0 Then Cells(Row, 3) = Split(.Item(0).href, "mailto:")(1)
        End With
    Next posts
    IE.Quit
End Sub
 
Last edited:
Or without clicking on the load-more buttons in another site in which data gets update with scrolling downward:
Code:
Sub Redmart_data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim storage As Object, post As Object
   
    With IE
        .Visible = True
        .navigate "https://redmart.com/search/phone"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
   
    For scroll_down = 1 To 20
        Set storage = html.getElementsByClassName("productDescriptionAndPrice")
        html.parentWindow.scrollBy 0, 99999
        Application.Wait Now() + TimeValue("00:00:005")
    Next scroll_down

    For Each post In storage
        With post.getElementsByTagName("h4")(0).getElementsByTagName("a")
            If .Length Then Row = Row + 1: Cells(Row, 1) = .Item(0).innerText
        End With
        With post.getElementsByClassName("ProductPrice__price___3BmxE")
            If .Length Then Cells(Row, 2) = .Item(0).innerText
        End With
    Next post
    IE.Quit
End Sub

All of the places, the logic of avoiding hardcoded-loop should be the same. Thanks.
 
It seems after a long try I have been very close to shake off hardcoded delay from my script to get the desired content from a slow loading webpage. The script can reach the bottom of a webpage. The only thing I need to do is break out of loop when it's done which i can't do it myself. Any help would be highly appreciated.

The is the optimized script I've created so far:

Code:
Sub Web_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim posts As Object, sText As String
    Dim TitlesCount As Object

    With IE
        .Visible = True
        .navigate "https://finance.yahoo.com/"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With

    NumChanges = 0

    Do
        Application.Wait Now() + TimeValue("00:00:04")
        Set TitlesCount = html.getElementsByClassName("StretchedBox")
        html.parentWindow.scrollBy 0, 99999
        If TitlesCount = TitlesCount.Length Then
            NumChanges = NumChanges + 1
        End If
        If NumChanges = TitlesCount Then Exit Do
    Loop

    For Each posts In html.getElementsByClassName("StretchedBox")
        sText = WorksheetFunction.Clean(posts.ParentNode.innerText)
        If Len(sText) > 0 Then
            Row = Row + 1
            Cells(Row, 1).Value = sText
        End If
    Next posts
    IE.Quit
End Sub

Btw, I took few instructions from some PatricK in Code Review to reach thus far.
 
Last edited:
Finally, I've solved the puzzle as to how we can reach (programmatically) the bottom of a slow-loading webpage with load more option. There is no hardcoded stuffs in my script now. Here it is:
Code:
Sub Web_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim storage As Object, post As Object, posts As Object

    With IE
        .Visible = True
        .navigate "https://www.housers.com/es/proyectos/avanzado"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
   
    Do
        Set storage = html.getElementsByClassName("titulo-oportunidad")
        html.parentWindow.scrollBy 0, 99999
        Set post = html.getElementById("loadMoreStocks")
        If Not post Is Nothing Then
            post.Click
            Application.Wait Now + TimeValue("00:00:003")
        Else: Exit Do
        End If
    Loop

    For Each posts In storage
        row = row + 1: Cells(row + 1, 1) = posts.innerText
    Next posts
    IE.Quit
End Sub
 
Back
Top