• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Facing trouble while dealing with loop again

shahin

Active Member
My written scraper is running well and scraping different names from a site but the problem is i can't break out of the loop (has been created to go for the next page) even if the all next pages are exhausted. What to do to deal with such condition?

Code:
Sub Get_Content()
    Dim ie As New InternetExplorer, html As HTMLDocument
    Dim itm As Object, post As Object, posts As Object, elem As Object

    With ie
        .Visible = True
        .navigate "https://brokercheck.finra.org/"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
  
    Set evt = html.createEvent("keyboardevent")
    evt.initEvent "change", True, False
  
    For Each itm In html.getElementsByTagName("input")
        If InStr(itm.placeholder, "Name or CRD#") > 0 Then
            itm.Value = "Michael John"
            Exit For
        End If
    Next itm
    itm.dispatchEvent evt
  
    For Each post In html.getElementsByTagName("input")
        If InStr(post.placeholder, "Firm Name or CRD# (optional)") > 0 Then
            post.Value = "Morgan Stanley"
            Exit For
        End If
    Next post
    post.dispatchEvent evt
  
    html.getElementsByClassName("md-button")(0).Click
    Do While ie.Busy Or ie.readyState <> 4: DoEvents: Loop

    Do  ''the loop just going on and on
  
        For Each elem In html.getElementsByClassName("smaller ng-binding flex")
            x = x + 1: Cells(x, 1) = elem.innerText
        Next elem
      
        html.getElementsByClassName("pagination-next")(0).getElementsByTagName("a")(0).Click
        Do While ie.Busy Or ie.readyState <> 4: DoEvents: Loop

    Loop Until html.getElementsByClassName("pagination-last ng-scope")(0).getElementsByTagName("a")(0).innerText = vbNullString
    ie.Quit
End Sub

This is how the nextpage controller looks like:
Untitled.jpg

Elements for the next page and the last page:
Code:
<ul class="pagination ng-pristine ng-untouched ng-valid ng-scope ng-isolate-scope ng-not-empty" data-ng-if="listCtrl.getTotalResults()" total-items="listCtrl.getDisplayResults()" ng-model="listCtrl.currentPage" max-size="1" page-label="listCtrl.pageLabel($page)" items-per-page="listCtrl.itemsPerPage" ng-change="listCtrl.pageChanged()" boundary-links="true" previous-text="‹" next-text="›" first-text="«" last-text="»" aria-invalid="false">
  <!-- ngIf: ::boundaryLinks --><li ng-if="::boundaryLinks" ng-class="{disabled: noPrevious()||ngDisabled}" class="pagination-first ng-scope"><a href="" ng-click="selectPage(1, $event)" class="ng-binding">«</a></li><!-- end ngIf: ::boundaryLinks -->
  <!-- ngIf: ::directionLinks --><li ng-if="::directionLinks" ng-class="{disabled: noPrevious()||ngDisabled}" class="pagination-prev ng-scope"><a href="" ng-click="selectPage(page - 1, $event)" class="ng-binding">‹</a></li><!-- end ngIf: ::directionLinks -->
  <!-- ngRepeat: page in pages track by $index --><li ng-repeat="page in pages track by $index" ng-class="{active: page.active,disabled: ngDisabled&amp;&amp;!page.active}" class="pagination-page ng-scope active"><a href="" ng-click="selectPage(page.number, $event)" class="ng-binding">26 of 27 pages</a></li><!-- end ngRepeat: page in pages track by $index -->
  <!-- ngIf: ::directionLinks --><li ng-if="::directionLinks" ng-class="{disabled: noNext()||ngDisabled}" class="pagination-next ng-scope"><a href="" ng-click="selectPage(page + 1, $event)" class="ng-binding">›</a></li><!-- end ngIf: ::directionLinks -->
  <!-- ngIf: ::boundaryLinks --><li ng-if="::boundaryLinks" ng-class="{disabled: noNext()||ngDisabled}" class="pagination-last ng-scope"><a href="" ng-click="selectPage(totalPages, $event)" class="ng-binding">»</a></li><!-- end ngIf: ::boundaryLinks -->
</ul>

Btw, the picture was taken when i was on the 26 th page that is why it looks that way otherwise all are same when it comes to find the elements for the last page.
 
Try inspecting the element at last page when the element is disabled.

You will notice that element class has changed to...
class="pagination-last ng-scope disabled"

Since you are checking for...
.getElementsByClassName("pagination-last ng-scope")
That class name changed and element no longer exist in "html"

You cannot get .innerText of non existent element and will result in error. Hence it will never return vbNullString and will indefinitely loop.

Easiest thing to check to exit loop is...
Code:
    Loop Until InStr(html.body.innerHTML, "class=""pagination-last ng-scope disabled""") > 0
 
It did solve the issue. You are invincible, sir. Thanksssssssssss a lot. However, the way you have written innerhtml is new to me. Something similar to what i see when working with "split method on responsetext". However, where did you find that "disabled" thing within class because i can't inspect the last ">>" button for it's being grayed out.
 
Also I forgot to mention, even if you check for .innerText of correct element, you won't find that it returns vbNullString. Though it's grayed out, the element still has ">" as innerHTML. Instead if you'll want to check for .href being vbNullString.
 
One thing to say sir, isn't there a space before " class" you meant instead of "class"? Because with the one containing space the "ie.quit" works otherwise the loop stops but browser is unable to close.
 
I not sure that I understand you.

If you mean the use inside INSTR that has no relevance to IE.quit, as long it exits the loop (space preceding or after the string won't matter as Instr will return positive result even if you exclude it). Since, Instr is just used to check that string exist within html.body's html code.
 
But the same logic when I apply on another site, it does break out of loop but produce 50 duplicates whereas the data I was expecting are 75 which means I'm getting 125 results in total.

Code:
Sub Aoty_Data()
    Dim ie As New InternetExplorer, html As New HTMLDocument
    Dim post As Object, topic As Object

    With ie
        .Visible = True
        .navigate "http://www.albumoftheyear.org/ratings/6-highest-rated/2000/1"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
    Do
        For Each topic In html.getElementsByClassName("albumListRow")
            x = x + 1
            With topic.getElementsByClassName("listLargeTitle")(0).getElementsByTagName("a")
                If .Length Then Cells(x, 1) = Split(.Item(0).innerText, "-")(0)
            End With
                With topic.getElementsByClassName("listLargeTitle")(0).getElementsByTagName("a")
                If .Length Then Cells(x, 2) = Split(.Item(0).innerText, "-")(1)
            End With
        Next topic
        For Each post In html.getElementsByTagName("a")
            If InStr(post.innerText, "Next >") > 0 Then
                post.Click
                Exit For
            End If
        Next post
        Do While ie.Busy Or ie.readyState <> 4: DoEvents: Loop
    Loop While InStr(html.body.innerHTML, " rel=""next""") > 0
    ie.Quit
End Sub

Elements for next page:

Code:
<div style="margin: 15px 0;"><a href="/ratings/6-highest-rated/2000/2" rel="prev" style="float:left;"><div class="pageSelect">&lt; Previous</div></a><a href="/ratings/6-highest-rated/2000/4" rel="next" style="float:right;"><div class="pageSelect">Next &gt;</div></a><br style="clear:both;"><br><div style="font-size:10px; font-weight:bold; text-transform:uppercase; margin-bottom:3px;">Page Select:</div><div><a href="/ratings/6-highest-rated/2000/1" rel="nofollow"><div class="smallBottomLink">1</div></a><a href="/ratings/6-highest-rated/2000/2" rel="nofollow"><div class="smallBottomLink">2</div></a><div class="smallBottomLink">3</div><a href="/ratings/6-highest-rated/2000/4" rel="nofollow"><div class="smallBottomLink">4</div></a></div></div>

Elements for the last page where "Next" is missing:

Code:
<div style="margin: 15px 0;"><a href="/ratings/6-highest-rated/2000/3" rel="prev" style="float:left;"><div class="pageSelect">&lt; Previous</div></a><br style="clear:both;"><br><div style="font-size:10px; font-weight:bold; text-transform:uppercase; margin-bottom:3px;">Page Select:</div><div><a href="/ratings/6-highest-rated/2000/1" rel="nofollow"><div class="smallBottomLink">1</div></a><a href="/ratings/6-highest-rated/2000/2" rel="nofollow"><div class="smallBottomLink">2</div></a><a href="/ratings/6-highest-rated/2000/3" rel="nofollow"><div class="smallBottomLink">3</div></a><div class="smallBottomLink">4</div></div></div>

Note: I do not wish to track down the numerical pagination; rather, i would like to stick to the way sir chihiro has shown above. As the last page doesn't have any remarkable flag so i could not get the data from that page and I had to use "loop while" instead of "loop until".
 
Back
Top