• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to get all the links leading to the next page?

shahin

Active Member
I've written some code in vba to get all the links leading to the next page from a webpage. However, it works fine only to a certain extent. The highest number of next page links is 255. Running my script, I get first 23 links along with the last page link but between them [24 to 254] are missing. How can I get all of them without hardcoding the highest number in the link for iteration? Here is what I'm trying with:
Code:
Sub YifyLink()
    Dim http As New XMLHTTP60, html As New HTMLDocument
   
    With http
        .Open "GET", "https://www.yify-torrent.org/search/1080p/", False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each post In html.getElementsByClassName("pager")(0).getElementsByTagName("a")
        x = x + 1: Cells(x, 1) = post.href
    Next post
End Sub

Elements within which the links lie:
Code:
<div class="pager"><a href="/search/1080p/"class="current">1</a><a href="/search/1080p/t-2/">2</a><a href="/search/1080p/t-3/">3</a><a href="/search/1080p/t-4/">4</a><a href="/search/1080p/t-5/">5</a><a href="/search/1080p/t-6/">6</a><a href="/search/1080p/t-7/">7</a><a href="/search/1080p/t-8/">8</a><a href="/search/1080p/t-9/">9</a><a href="/search/1080p/t-10/">10</a><a href="/search/1080p/t-11/">11</a><a href="/search/1080p/t-12/">12</a><a href="/search/1080p/t-13/">13</a><a href="/search/1080p/t-14/">14</a><a href="/search/1080p/t-15/">15</a><a href="/search/1080p/t-16/">16</a><a href="/search/1080p/t-17/">17</a><a href="/search/1080p/t-18/">18</a><a href="/search/1080p/t-19/">19</a><a href="/search/1080p/t-20/">20</a><a href="/search/1080p/t-21/">21</a><a href="/search/1080p/t-22/">22</a><a href="/search/1080p/t-23/">23</a><a href="/search/1080p/t-2/">Next</a><a href="/search/1080p/t-255/">Last</a></div>


The results I'm getting (partial portion):
about:/search/1080p/t-20/
about:/search/1080p/t-21/
about:/search/1080p/t-22/
about:/search/1080p/t-23/
about:/search/1080p/t-255/
 
Thanks Marc L, for your comment. I got the highest pagination link number by using the below process but what should I do next?
Code:
Sub YifyLink()
    Dim http As New XMLHTTP60, html As New HTMLDocument
   
    With http
        .Open "GET", "https://www.yify-torrent.org/search/1080p/", False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each post In html.getElementsByClassName("pager")(0).getElementsByTagName("a")
        If InStr(post.innerText, "Last") Then
            x = x + 1: Cells(x, 1) = Split(Split(post.href, "-")(1), "/")(0)
        End If
    Next post
End Sub

Result: 255
 

In fact as always easy logic just observing the webpage :
you just need to process the numeric tags "A" type …

If Next button is met, so you can load current last tag page + 1 …
 
@Marc L,
Sometimes I find it very difficult to follow your guideline cause what you say is very easy for you but at the same time very tough job for me to accomplish. Forgive my ignorance.:eek:
 
It's just a question of logic - easy at child level ! - to reach your need :
extract from first page last page number
so you just have to loop pages until last one …

If current page # is lower than last page # so read next page
or if exist Next button so read next page …
Many ways, just open your eyes !
 
Perhaps I have made this but I doubt I am overwriting stuffs. It gives me all the links, though!
Code:
Sub YifyLink()
    Const link = "https://www.yify-torrent.org/search/1080p/"
    Dim http As New XMLHTTP60, html As New HTMLDocument, htm As New HTMLDocument
    Dim x As Long, y As Long, item_link as String

    With http
        .Open "GET", link, False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each post In html.getElementsByClassName("pager")(0).getElementsByTagName("a")
        If InStr(post.innerText, "Last") Then
            x = Split(Split(post.href, "-")(1), "/")(0)
        End If
    Next post
    For y = 0 To x
        item_link = link & "t-" & y & "/"
  
        With http
            .Open "GET", item_link, False
            .send
            htm.body.innerHTML = .responseText
        End With
        For Each posts In htm.getElementsByClassName("pager")(0).getElementsByTagName("a")
            I = I + 1: Cells(I, 1) = posts.href
        Next posts
    Next y
End Sub
 
Last edited:
What I mess!!!! It gives me 6906 links. Filtering out duplicates I found that 254 unique links are there.
 
'Cause you didn't apply my post #4 advise …

But obviously only for pages links you just need to load first page
and extract last page number as all links have the same syntax ‼
A fix part, link ending with the page number …
Just open your mind and compare page #2 link and last page link.
No need to read all pages but just first one !
 
Just combining a request and Excel basics (a formula) :​
Code:
Sub Demo4Noob()
     Const URL = "https://www.yify-torrent.org/search/1080p/t-"
     Dim L&, P&
     ActiveSheet.UsedRange.Clear
With CreateObject("WinHttp.WinHttpRequest.5.1")
    .Open "GET", URL & "1/", False
    .setRequestHeader "DNT", "1"
     On Error Resume Next
    .send
     If Err.Number Then Beep: Exit Sub
     On Error GoTo 0
 If .Status <> 200 Then Beep: Exit Sub
        P = InStr(.responseText, "/"">Last</a>")
     If P = 0 Then
        L = 1
     Else
        L = InStrRev(.responseText, "-", P)
        L = Mid(.responseText, L + 1, P - L - 1)
     End If
End With
     [A1].Resize(L).Value = Evaluate("""" & URL & """&ROW(1:" & L & ")&""/""")
End Sub
 
What a solution!!! It took shorter than a second to scrape all the links. Few things looked foreign to me. I'll let you know If I don't understand something. Thanks again for your time.
 
As written since post #2 :
« Just by reading the link address of the Last button … » !

A well observation, a well coding …

So observe, awake your neurones …
… 'cause I will be off Web next week or
without having enough time to spend on any forum …
 
What a bad news!!! You should not have told this too early. It is always a great pleasure to be in touch with you. Thanks once again.
 
Back
Top