• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to customize a URL using IF statement or something?

shahin

Active Member
Hi there!! Hope you all are doing well. I've written a script in vba to scrape some movie names from a torrent site. The site has displayed it's content in several pages through pagination. When the next page link is clicked then along with usual address this very portion "?page=2" gets appended to the url and it increases like ("?page=3","?page=4" and so on) with the incremental click. You might have already noticed that it starts it's next page from "page=2" instead of "page=1". In fact, there is no "page=1" which means the default address is serving like "page=1". However, my intention is to use something like "if" block within my script so that my scraped next page urls are exactly like below. I've tried to give a nudge to make it work but it starts from "page=1" and at the same time misses "page=2". Any help on this will be highly appreciated. What I've done so far can be seen uncommenting the lines.

Here is my main code:
Code:
Sub yts()
    Const mlink As String = "https://yts.ag"
    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim new_link As String

    With http
        .Open "GET", "https://yts.ag/browse-movies", False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
        .send
        html.body.innerHTML = .responseText
    End With

    With html.getElementsByClassName("tsc_pagination")(0).getElementsByTagName("a")
        For n = 2 To .Length - 1
'            If n = 2 Then
'                new_link = mlink & Split(.item(n).href, ":")(1)
'                new_link = Left(new_link, Len(new_link) - 1) & "1"
'            Else
'                new_link = mlink & Split(.item(n).href, ":")(1)
'            End If
'            Cells(n, 1) = new_link
            Cells(n, 1) = .item(n).href
        Next n
    End With
End Sub

If i do nothing and run the above script then the partial result is like:

Code:
about:/browse-movies?page=2
about:/browse-movies?page=3
about:/browse-movies?page=4
about:/browse-movies?page=5

If i try to run the script uncommenting the lines, this is what i get:

Code:
https://yts.ag/browse-movies?page=1
https://yts.ag/browse-movies?page=3
https://yts.ag/browse-movies?page=4
https://yts.ag/browse-movies?page=5

Results I'm expecting:

Code:
https://yts.ag/browse-movies?page=1
https://yts.ag/browse-movies?page=2
https://yts.ag/browse-movies?page=3
https://yts.ag/browse-movies?page=4
https://yts.ag/browse-movies?page=5
 
Last edited:
If you want to scrape all 323 pages...
Edit: rather... If you want to generate URL for all 323 pages.
Code:
Sub yts()
'    Const mlink As String = "https://yts.ag"
    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim new_link As String, baseUrl As String
    Dim nMax As Long, n As Long
    Dim x
    With http
        .Open "GET", "https://yts.ag/browse-movies", False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
        .send
        html.body.innerHTML = .responseText
    End With
    With html.getElementsByClassName("tsc_pagination")(0).getElementsByTagName("a")
        x = Split(.Item(.Length - 1).href, "=")
        nMax = CLng(x(UBound(x)))
        baseUrl = "https://" & Split(x(0), ":/")(1) & "="
        For n = 1 To nMax
            Cells(n, 1) = baseUrl & n
        Next
          
    End With
End Sub
 
Last edited:
This is so damnnnn nice!!!!!!!!!!!!!! I have never seen such incredible thing as you did above. I'm totally dumbfounded to see the result derived from executing your code. One thing to know sir: what is "CLng" and what does this guy do here? Thankssssssssssssss again sir.
 
Thanks for the compliment.
String manipulation using SPLIT function is fairly standard process, and is very versatile.
http://spreadsheetpage.com/index.php/tip/the_versatile_split_function/
https://www.mrexcel.com/forum/excel-questions/267057-splitting-text-3.html

Cxxx functions are conversion functions used to convert expression to appropriate data type, where it's able to.

CLng is to convert expression to Long data type.

You can find list of type conversion function in link.
https://docs.microsoft.com/en-us/do...reference/functions/type-conversion-functions
 
Although, I know how this portion "UBound(x)" works but when it gets wrapped within "x(UBound(x))" then I get confused about the role of first "x". Please don't get bothered. Thanks again, sir.
 
Simple really.

x holds array of split string. UBound(x) returns index of last item in array.

Ex.
x = Split("I love cheese so much", " ")

Then UBound(x) = 4
Since x(0) = I, x(1) = love ... x(4) = much

x(Ubound(x)) equates to accessing last element of array.
 
It's very hard to find any other questions on this topic anymore. Thanks sir for the clarity.
 
Back
Top