1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Unable to parse all the links from a webpage

Discussion in 'VBA Macros' started by shahin, Jun 16, 2017.

  1. shahin

    shahin Active Member

    Messages:
    479
    Can't figure out how can I get all the company links from the page used in my code. Running my script I get only 20 links. The page has got lazyloading method that is why I can't get all of them. Any input on this will be highly appreciated. I've tried so far with:
    Code (vb):

    Sub Web_Data()
    Const lnk = "http://fortune.com"
    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim topics As Object, topic As Object

    With http
        .Open "GET", "http://fortune.com/fortune500/list/", False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each topic In html.getElementsByClassName("small-12 column row")
        x = x + 1
        With topic.getElementsByTagName("a")
            If .Length Then Cells(x, 1) = lnk & Split(.item(0).href, "about:")(1)
        End With
    Next topic
    Set html = Nothing: Set topics = Nothing
    End Sub
     
  2. SirJB7

    SirJB7 Excel Rōnin

    Messages:
    8,894
    Hi, shahin!
    [​IMG]
    Regards!
    shahin likes this.
  3. shahin

    shahin Active Member

    Messages:
    479
    Thanks SirJB7, for the picture. Metaphorically identical stuff. It does represent my problem but I don't know how to make the browser wait until the content is fully loaded using xmlhttp.
  4. shahin

    shahin Active Member

    Messages:
    479
    Thanks Marc L, for your response. For the last couple of hours I've been waiting for this moment. Finally, you got back. Anyways, as for the answer to your question: I have double-checked my code several times to find out If I have made any mistake but unfortunately I can't figure out. Class name seems to be right to me.
  5. Marc L

    Marc L Excel Ninja

    Messages:
    3,174

    Sorry, it was a misreading of your need …

    So you can only get the first 20 links as in the page within any webbrowser
    without the $999 for the complete list !
  6. shahin

    shahin Active Member

    Messages:
    479
    Yes it is. I tried to find out if there is any hidden pagination in the url but unable to do that either.
  7. SirJB7

    SirJB7 Excel Rōnin

    Messages:
    8,894
    Hi, shahin!
    I didn't intend to be metaphoric, just suggested to try placing a delay, a wait, or a loop doing nothing... in order to wait for the full page load.
    Regards!
  8. Marc L

    Marc L Excel Ninja

    Messages:
    3,174

    Hi !

    No needs any wait 'cause its a request with False parameter in open.
    As source URL has only 20 items …
    shahin likes this.
  9. shahin

    shahin Active Member

    Messages:
    479
    @Marc L,
    I found it very tricky compare to those I've worked so far with. At this point I can't think anything to move along.
  10. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,598
    Hi ,

    If you can go through the JSON text , then , this is the URL :

    http://fortune.com/api/v2/list/2013055/expand/item/ranking/asc/0/50

    What this does is return the first 50 entries in the list ; within this list , you can search for this text to get at the name of the organization :

    "fullname":

    Changing the 0/50 to 50/50 will return the next 50 entries in the list , from rank 51 to rank 100.

    Narayan
    shahin and Marc L like this.
  11. shahin

    shahin Active Member

    Messages:
    479
    Thanks Narayan, for your sharp and effective finding. You have been a great help when in need. I'm trying myself with the chrome developer tool to figure out how you discover that link. If you don't take it as a botheration then you might wanna give me a hint how to find out such links (for future reference). Thanks a trillion.
  12. NARAYANK991

    NARAYANK991 Excel Ninja

    Messages:
    15,598
    Hi ,

    I used the Chrome Developer tool to look at the requests that were being sent ; as you scroll down the page , you can see the additional requests that were made.

    Narayan
    shahin likes this.
  13. shahin

    shahin Active Member

    Messages:
    479
    Thanks Narayan, for your instruction and guideline. I wanted to scrape the name of organizations and the ceo of each organizations. Successful for the first 100 names. Is there any way I can make use of the api in my script to roll until 1000 records are exhausted? Btw, I tried inputting 1000 in the url but it gives 100 records at best. Here is the working code for first 100 records:
    Code (vb):

    Sub Web_Data()
    Dim http As New XMLHTTP60
    Dim str As Variant

    With http
        .Open "GET", "http://fortune.com/api/v2/list/2013055/expand/item/ranking/asc/0/100", False
        .send
        str = Split(.responseText, "fullname"":""")
    End With

    N = UBound(str)

    For L = 1 To N
        Cells(L + 1, 1) = Split(str(L), """")(0)
        Cells(L + 1, 2) = Split(Split(str(L), "ceo"":""")(1), """")(0)
    Next L
    End Sub
     
    Once again, hats off to you Narayan.
  14. shahin

    shahin Active Member

    Messages:
    479
    Eventually, I got a workaround. Now It gives me 101 to 900 records. Here is the code.
    Code (vb):

    Sub Web_Data()
    Dim http As New XMLHTTP60
    Dim str As Variant
    For x = 1 To 20
        With http
            .Open "GET", "http://fortune.com/api/v2/list/2013055/expand/item/ranking/asc/" & (x * 50) & "/50", False
            .send
            str = Split(.responseText, "fullname"":""")
        End With
     
        N = UBound(str)
     
        For L = 1 To N
            y = y + 1
            Cells(y, 1) = Split(str(L), """")(0)
            Cells(y, 2) = Split(Split(str(L), "ceo"":""")(1), """")(0)
        Next L
    Next x
    End Sub
     
    NARAYANK991 likes this.

Share This Page