shahin
Active Member
Hi there everyone!! Hope everything is fine around you. I've got a problem understanding a loop related complexity and hope to have any clarity on this.
Few days back, when I was trying to create a scraper which may deal with a webpage displaying it's data in multiple pages through it's pagination feature, Sir Chihiro, helped me build that. It was created without using the last page number in the url. However, it works iteratively.
Today I've created another scraper using the same logic and it is doing just fine. The only thing I can't understand here is the scope of variable "R". I thought and knew so far that the value of "R" starts with "1" and keep increasing it's value with the continuation of "loop" and it will ragain it's value "1" as soon as the "for loop ends". But, it is playing it's role quite differently here. When the first loop ends, the highest value of "R" becomes "10". So I thought the value of "R" will start from "1" again when the second loop statrs but it starts from "11" instead.
My question is: How it is happening? Apology in advance for my poor knowledge.
This is the script:
The scraper parses three fields from each of the containers out of that webpage- "Rank", "University Name" and "Country". It keeps on going until it exhausts 60 different pages through pagination.
Few days back, when I was trying to create a scraper which may deal with a webpage displaying it's data in multiple pages through it's pagination feature, Sir Chihiro, helped me build that. It was created without using the last page number in the url. However, it works iteratively.
Today I've created another scraper using the same logic and it is doing just fine. The only thing I can't understand here is the scope of variable "R". I thought and knew so far that the value of "R" starts with "1" and keep increasing it's value with the continuation of "loop" and it will ragain it's value "1" as soon as the "for loop ends". But, it is playing it's role quite differently here. When the first loop ends, the highest value of "R" becomes "10". So I thought the value of "R" will start from "1" again when the second loop statrs but it starts from "11" instead.
My question is: How it is happening? Apology in advance for my poor knowledge.
This is the script:
Code:
Sub GetInfo()
Dim HTTP As New XMLHTTP60, HTML As New HTMLDocument
Dim post As HTMLHtmlElement, elem As Object, R&, link$, base$
link = "https://www.usnews.com/education/best-global-universities/chemistry"
base = "https://www.usnews.com/education/best-global-universities/chemistry?page="
While link <> ""
With HTTP
.Open "GET", link, False
.send
HTML.body.innerHTML = .responseText
End With
For Each post In HTML.getElementsByClassName("sep")
With post.getElementsByClassName("rankscore-bronze")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).innerText
End With
With post.getElementsByTagName("h2")
If .Length Then Cells(R, 2) = .Item(0).innerText
End With
With post.getElementsByClassName("t-taut")
If .Length Then Cells(R, 3) = .Item(0).innerText
End With
Next post
link = ""
For Each elem In HTML.getElementsByClassName("pagination")(0).getElementsByTagName("a")
If InStr(elem.innerText, "Next") > 0 Then link = base & Split(elem.href, "page=")(1): Exit For
Next elem
Wend
End Sub
The scraper parses three fields from each of the containers out of that webpage- "Rank", "University Name" and "Country". It keeps on going until it exhausts 60 different pages through pagination.