• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to get newly produced urls outside of a for loop?

shahin

Active Member
I've written a macro using InternetExplorer to get all the movie urls from a torrent site. When I execute my macro, it does fetch all the required urls. However, what I expect is to have those 20 urls outside of the for loop within my macro. When I drag the variable "n_url" (containing the newly scraped links) outside of the for loop and print it, i can get only the last url. How can I achieve it, I meant, I will print the "n_url" outside the loop and still get all the urls?
Code:
Sub torrent_info()
    Dim IE As New InternetExplorer, html As HTMLDocument, post As Object

    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With

    For Each post In html.getElementsByClassName("browse-movie-bottom")
        With post.getElementsByTagName("a")
            If .Length Then n_url = .Item(0).href
        End With
'        Debug.Print n_url  ''it can fetch all the 20 urls from that page
    Next post
    Debug.Print n_url  ''if printed out, it will only fetch the last url
  
    IE.Quit
End Sub
 
Hello my friend
Try this line instead
Code:
If .Length Then n_url = n_url & IIf(n_url = "", "", vbCrLf) & .Item(0).href
 
It seems I've found another way to deal with this problem. However, the only thing I need to manage is the first trailing space:
Code:
Sub torrent_info()
    Dim IE As New InternetExplorer, html As HTMLDocument, post As Object, n_url As String

    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
    
    For Each post In html.getElementsByClassName("browse-movie-bottom")
        With post.getElementsByTagName("a")
            If .Length Then n_url = n_url & vbNewLine & .Item(0).href
        End With
    Next post
    MsgBox n_url
  
    IE.Quit
End Sub
 
@YasserKhalil, The code you provided doesn't contain any flaw, I meant any trailing space but the way I've written contains a trailing space which I expected any workaround for. Look at the below expression.
Code:
If .Length Then n_url = n_url & vbNewLine & .Item(0).href
 
That's because in the first loop, the string variable "n_url" is still empty and you add vbNewLine first to the string so there is a trailing space
So as for mine I used IIF to put a simple criteria if the "n_url" is empty then to skip vbNewLine but if it is not empty then to add vbNewLine
Hope that helps and sorry for my English
 
I tried to create a scraper which is the combination of "InternetExplorer" and "XMLHTTP" request. I wanted to do this because I came across few sites which have got javascript enabled only Its first page but the rest of the pages, I meant if you go deeper, you won't face any barrier to pull out data with 'XMLHTTP" request. I've already created one which is working just fine, although the experimental site is not javascript encrypted but this method should work when in need.

Code:
Sub torrent_info()
    Dim IE As New InternetExplorer, ihtml As HTMLDocument, elem As Object
    Dim http As New XMLHTTP60, html As New HTMLDocument, post As Object
    Dim n_url As String, link As Variant, item_vault As Variant

    With IE
        .Visible = True
        .navigate "https://yts.am/browse-movies"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set ihtml = .document
    End With

    For Each post In ihtml.getElementsByClassName("browse-movie-bottom")
        With post.getElementsByTagName("a")
'            If .Length Then n_url = n_url & vbNewLine & .Item(0).href
           If .Length Then n_url = n_url & IIf(n_url = "", "", vbCrLf) & .Item(0).href
        End With
    Next post
    IE.Quit
   
    item_vault = Split(n_url, vbNewLine)

    For Each link In item_vault
        With http
            .Open "GET", link, False
            .send
            html.body.innerHTML = .responseText
        End With
        For Each elem In html.getElementsByClassName("hidden-xs")
            With elem.getElementsByTagName("h1")
                If .Length Then r = r + 1: Cells(r, 1) = .Item(0).innerText
            End With
        Next elem
    Next link
End Sub
 
Last edited:
If I only apply "Trim()" then it doesn't bring about any change? How does that worksheet function look like?
 
This will get rid of first vbNewLine.
Code:
Replace(n_url, vbNewLine, "", 1, 1)

FYI - It's not space but vbNewLine... so Trim won't get rid of it. When I checked couldn't find any trailing/preceding space.
 
... Really, you've got to understand your own code a bit better, as I've indicated in the past. Monitor variable using local window and immediate window along with watch window to see how variable changes with each iteration.

You'll see that extra vbNewline is tacked on at start. You can eliminate it by doing If check (via standard IF Then Else, or IIF). As indicated by YasserKhalil. But since you don't want to do it that way...

Only 1 instance of extra vbNewLine is present at final string... therefore you'd do replace on final string (i.e. outside loop).
Code:
Sub torrent_info()
  Dim IE AsNew InternetExplorer, html As HTMLDocument, post AsObject, n_url AsString

  With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
      DoUntil .readyState = READYSTATE_COMPLETE: Loop
      Set html = .document
  EndWith

  ForEach post In html.getElementsByClassName("browse-movie-bottom")
      With post.getElementsByTagName("a")
          If .Length Then n_url = n_url & vbNewLine & .Item(0).href
      EndWith
  Next post
    MsgBox Replace(n_url, vbNewLine, "", 1, 1)

    IE.Quit
EndSub
 
Thanks sir for the guideline. At this point, it seems the way YasserKhalil showed is comparatively easier to follow. I thought the way i started will end up with lesser pain but turned out to be the opposite.
 
So, the best way is again how sir Chihiro suggested the other day (using dictionary) in another thread which more or less similar to this issue:
Code:
Sub torrent_info()
    Dim IE As New InternetExplorer, ihtml As HTMLDocument, elem As Object
    Dim http As New XMLHTTP60, html As New HTMLDocument, post As Object
    Dim itemdict As Object, key As Variant

    Set itemdict = CreateObject("Scripting.Dictionary")
  
    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set ihtml = .document
    End With

    For Each post In ihtml.getElementsByClassName("browse-movie-bottom")
        With post.getElementsByTagName("a")
          If .Length Then itemdict(.Item(0).href) = 1
        End With
    Next post
    IE.Quit
  
    For Each key In itemdict.keys
        With http
            .Open "GET", key, False
            .send
            html.body.innerHTML = .responseText
        End With
        For Each elem In html.getElementsByClassName("hidden-xs")
            With elem.getElementsByTagName("h1")
                If .Length Then R = R + 1: Cells(R, 1) = .Item(0).innerText
            End With
        Next elem
    Next key
End Sub
 
Last edited:
Back
Top