• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to modify the loop to get all the content displayed multiple pages?

shahin

Active Member
I've written a script to parse some information from a webpage. The information displayed multiple pages through pagination. I do not wish to hardcode the number of the last page but I expect the parser should still exhaust them all. This question has been erected many times in this forum. However, as every problem is different in nature from another so i thought to create a post.

Here is what I have tried (logic taken from one of Narayan's solution):
Code:
Sub Web_Data()
    link$ = "https://info.bacb.com/o.php?page=100155&by=state&state=AL&pagenum="
    Dim HTTP As New XMLHTTP60, HTML As New HTMLDocument
    Dim posts As Object, elem As Object, trow As Object
 
    Do
        p = p + 1
        With HTTP
            .Open "GET", link & p, False
            .send
            HTML.body.innerHTML = .responseText
        End With
 
        Set posts = HTML.getElementsByTagName("table")(2)
 
        For Each elem In posts.Rows
            For Each trow In elem.Cells
                c = c + 1: Cells(r + 1, c) = trow.innerText
            Next trow
            c = 0: r = r + 1
        Next elem
    Loop While InStr(HTTP.responseText, "<b>Next -&gt;</b>") > 0
End Sub

Html elements for next page:
Code:
<td colspan="6" bgcolor="#FFFFFF"><br>
                <span class="paging"><b> -- Page 1 of 3 -- </b></span><p><span class="paging"> <a href="?page=100155&amp;by=state&amp;state=AL&amp;pagenum=2"><b>Next -&gt;</b></a> &nbsp;&nbsp;</span> <span class="paging"> <a href="?page=100155&amp;by=state&amp;state=AL&amp;pagenum=3">Last -&gt;&gt;</a> </span>
            </p></td>

Btw, the content traverses 3 pages.
 
What could be the right approach to achieve the same using IE. The portion i used within "Loop While InStr(IE.document, "<b>Next -></b>") > 0" is a mess because IE doesn't work that way. I kept it as a placeholder. However, I wish someone will take a look into it.
Code:
Sub Web_Data()
    link$ = "https://info.bacb.com/o.php?page=100155&by=state&state=AL&pagenum="
    Dim IE As New InternetExplorer, HTML As HTMLDocument
    Dim posts As Object, elem As Object, trow As Object
  
    Do
        p = p + 1
        With IE
            .Visible = True
            .navigate link & p
            While .Busy = True Or .readyState < 4: DoEvents: Wend
            Set HTML = .document
        End With
  
        Set posts = HTML.getElementsByTagName("table")(2)
  
        For Each elem In posts.Rows
            For Each trow In elem.Cells
                c = c + 1: Cells(r + 1, c) = trow.innerText
            Next trow
            c = 0: r = r + 1
        Next elem
    Loop While InStr(IE.document, "<b>Next -></b>") > 0
    IE.Quit
End Sub
 

Under IE, as you can also check objects,
what about observing the "paging" class ? …​
 
Hi Marc L, I tried like this:

Code:
Loop Until IsNull(IE.document.querySelector(".paging a b"))

but the loop keeps going on and on because the "previous page", "last page" has got the same class name.
 
It is very not an issue to check an element / object within a collection
as yet shown in several threads of yours, not later than Thursday …

Several ways like for example comparing page number
versus last page displayed …
 
Solved it. The issue with IE was different character encoding.

Code:
Sub Web_Data()
    link$ = "https://info.bacb.com/o.php?page=100155&by=state&state=AL&pagenum="
    Dim IE As New InternetExplorer, HTML As HTMLDocument
    Dim posts As Object, elem As Object, trow As Object

    Do
        p = p + 1
        With IE
            .Visible = True
            .navigate link & p
            While .Busy = True Or .readyState < 4: DoEvents: Wend
            Set HTML = .document
        End With

        Set posts = HTML.getElementsByTagName("table")(2)

        For Each elem In posts.Rows
            For Each trow In elem.Cells
                If Not InStr(trow.innerText, "Next") > 0 Then
                    c = c + 1: Cells(r + 1, c) = trow.innerText
                End If
            Next trow
            c = 0: r = r + 1
        Next elem
    Loop While InStr(IE.document.body.outerHTML, "<b>Next -&gt;</b>")
    IE.Quit
End Sub
 
No encoding issue with
Loop While InStr(IE.Document.body.innerHTML, "<b>Next -")

Or directly check within worksheet
as you copy all data even the unnecessary !
 
Yes, it is working. Basically, I was not very sure what is the alternative approach of ".responseText" If I choose to go for IE until I came across "IE.document.body.outerHTML". However, It seems to me that somewhere within this forum I've seen the usage of ".responseText" in combination with IE. I may be wrong but still I feel like seeing it.
 
Back
Top