Trouble understanding a loop related logic

shahin · May 30, 2018

Hi there everyone!! Hope everything is fine around you. I've got a problem understanding a loop related complexity and hope to have any clarity on this.

Few days back, when I was trying to create a scraper which may deal with a webpage displaying it's data in multiple pages through it's pagination feature, Sir Chihiro, helped me build that. It was created without using the last page number in the url. However, it works iteratively.

Today I've created another scraper using the same logic and it is doing just fine. The only thing I can't understand here is the scope of variable "R". I thought and knew so far that the value of "R" starts with "1" and keep increasing it's value with the continuation of "loop" and it will ragain it's value "1" as soon as the "for loop ends". But, it is playing it's role quite differently here. When the first loop ends, the highest value of "R" becomes "10". So I thought the value of "R" will start from "1" again when the second loop statrs but it starts from "11" instead.

My question is: How it is happening? Apology in advance for my poor knowledge.

This is the script:

Code:

Sub GetInfo()
    Dim HTTP As New XMLHTTP60, HTML As New HTMLDocument
    Dim post As HTMLHtmlElement, elem As Object, R&, link$, base$
  
    link = "https://www.usnews.com/education/best-global-universities/chemistry"
    base = "https://www.usnews.com/education/best-global-universities/chemistry?page="

    While link <> ""
        With HTTP
            .Open "GET", link, False
            .send
            HTML.body.innerHTML = .responseText
        End With
        For Each post In HTML.getElementsByClassName("sep")
            With post.getElementsByClassName("rankscore-bronze")
                If .Length Then R = R + 1: Cells(R, 1) = .Item(0).innerText
            End With
            With post.getElementsByTagName("h2")
                If .Length Then Cells(R, 2) = .Item(0).innerText
            End With
            With post.getElementsByClassName("t-taut")
                If .Length Then Cells(R, 3) = .Item(0).innerText
            End With
        Next post
      
        link = ""
      
        For Each elem In HTML.getElementsByClassName("pagination")(0).getElementsByTagName("a")
            If InStr(elem.innerText, "Next") > 0 Then link = base & Split(elem.href, "page=")(1): Exit For
        Next elem
    Wend
End Sub

The scraper parses three fields from each of the containers out of that webpage- "Rank", "University Name" and "Country". It keeps on going until it exhausts 60 different pages through pagination.

Chihiro · May 30, 2018

There is nothing in your code that's resetting R to 1 inside your loop. So, it will retain it's value throughout loop iteration, as you perform R = R + 1

shahin · May 30, 2018

Thanks for you reply, sir. I'm still unable to understand. When "for loop" ends for the first time and the scraper takes up a new link, doesn't the "for loop" starts from the beginning to read the content of that newly produced url?

Chihiro · May 30, 2018

That's not function of R, but dependent on link variable.

You'll see in code where link is reset to "" (blank), in while... loop.

No such logic is used for R variable. So value is retained until it's out of context (meaning when sub routine finishes).

shahin · May 30, 2018

Got it this time, sir. Much obliged.

stefanoste78 · May 31, 2018

Hi shahin. Could you attach the file with the macro?

Trouble understanding a loop related logic

shahin

Active Member

Chihiro

Excel Ninja

shahin

Active Member

Chihiro

Excel Ninja

shahin

Active Member

stefanoste78

Member