shahin
Active Member
Hi there!! I've written a script in vba to get some product links spreading across multiple pages through pagination from a webpage. So far, what I've written is running just great. There is a list of multiple products in the first page and every product has traversed several pages through pagination. To keep the description understandable, I have chosen a particular product to keep track of which is in this case "3M". It has spread across 12 pages or some with 16 products in each page. However, from this position it traverses next page in such a way so that if it is in page one, only page two will be visible from next page button, if inspected. So, if i make use of that next page link i can only go to the page 2 and parses the content of 1 and 2 page. But, there are 12 pages. Anyways, an alternative is there. If i grab the links from pagination block I can get links for 6 pages only. However, I have created it somewhat dynamically to exhaust the full links without any hardcoding any number to the link. The only problem I am facing is that the loop doesn't end. It just keeps running. Perhaps, I've messed up with the "responsetext" part in "instr" function in the "loop until" line. If "Next" keyword is placed there accordingly, I think it will definitely work. I can get all results though!
Elements within which the "Next" keyword lies:
I'm uploading an image to make sure I could properly make myself clear what type of pagination system I was talking about.
Code:
Sub ControlPagination()
Dim http As New XMLHTTP60, html As New HTMLDocument, htm As New HTMLDocument
Dim post As Object, posts As Object
Dim nlink As Object, link As Object
Dim pro_link As String, page_link As String
With http
.Open "GET", "http://store.immediasys.com/brands/", False
.send
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByClassName("SubBrandList")(0).getElementsByTagName("a")
If InStr(post.innerText, "3M") > 0 Then
pro_link = post.href
With http
.Open "GET", pro_link, False
.send
html.body.innerHTML = .responseText
End With
Set nlink = html.getElementsByClassName("PagingList")(0)
Set link = nlink.getElementsByTagName("a")(0)
page_link = Split(link.href, "page")(0)
Do
p = p + 1
With http
.Open "GET", page_link & "page=" & p & "&sort=featured", False
.send
htm.body.innerHTML = .responseText
End With
For Each posts In htm.getElementsByClassName("ProductDetails")
With posts.getElementsByTagName("a")
If .Length Then x = x + 1: Cells(x, 1) = .Item(0).href
End With
Next posts
Loop Until InStr(http.responseText, ">Next »</a>")
End If
Next post
MsgBox "All Done"
End Sub
Elements within which the "Next" keyword lies:
Code:
<div class="FloatRight"><a href="http://store.immediasys.com/brands/3M.html?page=12&sort=featured">Next»</a></div>
I'm uploading an image to make sure I could properly make myself clear what type of pagination system I was talking about.