• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Can't get any idea to parse pagination links

shahin

Active Member
Hi there! Running my script to parse some data from a site, I could notice that it does fetch info from certain pages where data are embedded nicely but when i try to get the links for the next pages i got stuck. Because, it fetches numbers, as in 1,2,3 instead of href. Any help would be highly appreciated. Code for both the working and befuddling one.
Code:
Sub ArchitectInfo()
Dim http As New MSXML2.XMLHTTP60
Dim html As New HTMLDocument
Dim items As Object, item As Object, post As Object
Dim things As Object, thing As Object

PostData = "action=show_search_result&action_spam=dDfgEr&txtSearchType=5&txtPracName=&optSstate=3&optRegions=23&txtPcode=&txtShowBuildingType=0&optBuildingType=1&optHomeType=1&optBudget="
With http
    .Open "POST", "http://www.findanarchitect.com.au/index.php", False
    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    .send PostData
    html.body.innerHTML = .responseText
End With
Set items = html.getElementsByClassName("clearboth")
    For Each item In items
        Set post = item.getElementsByTagName("h2")
        If post.Length Then
            x = x + 1
            Cells(x, 1) = post(0).innerText
        End If
    Next item
'Set things = html.getElementById("pagination").getElementsByTagName("a")
'    For Each thing In things
'            x = x + 1
'            Cells(x, 1) = thing.innerText
'    Next thing
End Sub
 
Page uses ajax to fire script and get json response to generate page data. Use developer tool to follow and you will see that you can use following Get request.

http://www.findanarchitect.com.au/index.php?action=ajax_goto_page&page_no=1&search_type=1

If that doesn't work you can navigate through IE automation, I suppose.

Here's the part that contains link info in responseText. You can see that it has no url but fires java script.
Code:
<div style="float:left">Page 1 of 23 &nbsp;</div>
<ul>
    <li class = "current"><a href = "javascript: void(0);" onclick = "js_goto_page(1)">1</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(2)">2</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(3)">3</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(4)">4</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(5)">5</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(6)">6</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(7)">7</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(8)">8</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(9)">9</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(10)">10</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(11)">[11-20]</a></li><li class = ""><a href = "javascript: void(0);" onclick = "js_goto_page(21)">[21-23]</a></li>
</ul>

</div>
 
Last edited:
One important thing to consider: How can I deal with the drop down options used in the first page for which the "form data" aka PostData in my post coming along. Basically, one dropdown option can be controllable from my end but how to manage the two simultaneously?
 
Last edited:
And all that matters in the "formdata" in developer tools is how this two "optSstate=3&optRegions=23" parameter are changed. However, the two parameters need to get filled in with two numbers. At this point I can't figure out how my for loop should look like so that all the numbers can be filled in such a way so that I can get the total names available out there.
 
Since number represents values/strings stored on Web server side, you'll need to use developer tool to find out what each number is tied to.
 
Back
Top