• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Trouble understanding a loop related logic

shahin

Active Member
Hi there everyone!! Hope everything is fine around you. I've got a problem understanding a loop related complexity and hope to have any clarity on this.

Few days back, when I was trying to create a scraper which may deal with a webpage displaying it's data in multiple pages through it's pagination feature, Sir Chihiro, helped me build that. It was created without using the last page number in the url. However, it works iteratively.

Today I've created another scraper using the same logic and it is doing just fine. The only thing I can't understand here is the scope of variable "R". I thought and knew so far that the value of "R" starts with "1" and keep increasing it's value with the continuation of "loop" and it will ragain it's value "1" as soon as the "for loop ends". But, it is playing it's role quite differently here. When the first loop ends, the highest value of "R" becomes "10". So I thought the value of "R" will start from "1" again when the second loop statrs but it starts from "11" instead.

My question is: How it is happening? Apology in advance for my poor knowledge.

This is the script:
Code:
Sub GetInfo()
    Dim HTTP As New XMLHTTP60, HTML As New HTMLDocument
    Dim post As HTMLHtmlElement, elem As Object, R&, link$, base$
  
    link = "https://www.usnews.com/education/best-global-universities/chemistry"
    base = "https://www.usnews.com/education/best-global-universities/chemistry?page="

    While link <> ""
        With HTTP
            .Open "GET", link, False
            .send
            HTML.body.innerHTML = .responseText
        End With
        For Each post In HTML.getElementsByClassName("sep")
            With post.getElementsByClassName("rankscore-bronze")
                If .Length Then R = R + 1: Cells(R, 1) = .Item(0).innerText
            End With
            With post.getElementsByTagName("h2")
                If .Length Then Cells(R, 2) = .Item(0).innerText
            End With
            With post.getElementsByClassName("t-taut")
                If .Length Then Cells(R, 3) = .Item(0).innerText
            End With
        Next post
      
        link = ""
      
        For Each elem In HTML.getElementsByClassName("pagination")(0).getElementsByTagName("a")
            If InStr(elem.innerText, "Next") > 0 Then link = base & Split(elem.href, "page=")(1): Exit For
        Next elem
    Wend
End Sub

The scraper parses three fields from each of the containers out of that webpage- "Rank", "University Name" and "Country". It keeps on going until it exhausts 60 different pages through pagination.
 
There is nothing in your code that's resetting R to 1 inside your loop. So, it will retain it's value throughout loop iteration, as you perform R = R + 1
 
Thanks for you reply, sir. I'm still unable to understand. When "for loop" ends for the first time and the scraper takes up a new link, doesn't the "for loop" starts from the beginning to read the content of that newly produced url?
 
That's not function of R, but dependent on link variable.

You'll see in code where link is reset to "" (blank), in while... loop.

No such logic is used for R variable. So value is retained until it's out of context (meaning when sub routine finishes).
 
Back
Top