• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Trouble scraping links in the right way

shahin

Active Member
I've written some code in vba to scrape addresses from a webpage. There are 20 links in that page. The address I wish to parse lies within each link. My intention is to click each link to unveil the information and parse that. Upon execution the parser is suppose to scrape an address from the first link then it will go for the next link and repeat the process. However, my below script can do it erroneously. It clicks on each link but scrapes the same information (the information related to the first link) over and over again. How can I fix it?

First I tried like this:

Code:
Sub Scrape_Items()
    URL$ = "http://www.incometaxindia.gov.in/Pages/utilities/exempted-institutions.aspx"
    Dim post As Object, elem As Object

    With CreateObject("InternetExplorer.Application")
        .Visible = True
        .navigate URL
        While .Busy = True Or .ReadyState < 4: DoEvents: Wend
            
        For Each post In .Document.getElementsByClassName("fc-blue")
            post.Click
            Set elem = .Document.getElementsByClassName("exempted-detail")(0).getElementsByTagName("span")(0)
            r = r + 1: Cells(r, 1) = elem.innerText
            Application.Wait Now + TimeValue("00:00:05")
        Next post
    End With
End Sub

Then I tried like this but no luck, the results are always the same:

Code:
Sub Scrape_Items()
    URL$ = "http://www.incometaxindia.gov.in/Pages/utilities/exempted-institutions.aspx"
    Dim post As Object, elem As Object, ldic As Object, key As Variant
 
    Set ldic = CreateObject("Scripting.Dictionary")

    With CreateObject("InternetExplorer.Application")
        .Visible = True
        .navigate URL
        While .Busy = True Or .ReadyState < 4: DoEvents: Wend
           
        For Each post In .Document.getElementsByClassName("fc-blue")
            ldic(post) = 1
        Next post
   
        For Each key In ldic.Keys
            key.Click
            Set elem = .Document.getElementsByClassName("exempted-detail")(0).getElementsByTagName("span")(0)
            r = r + 1: Cells(r, 1) = elem.innerText
            Application.Wait Now + TimeValue("00:00:03")
        Next key
    End With
End Sub
 
Last edited:
Study the source code. There's no need to expand each item for scraping. Data is already in the source.

Also, since you are setting elem to fixed object. Each loop will get exactly the same info.

Code:
Sub Scrape_Items()
    Url$ = "http://www.incometaxindia.gov.in/Pages/utilities/exempted-institutions.aspx"
    Dim post As Object, elem As Object

    With CreateObject("InternetExplorer.Application")
        .Visible = True
        .navigate Url
        While .Busy = True Or .ReadyState < 4: DoEvents: Wend
       
        Set elem = .Document.getElementsByClassName("exempted-detail")
        For i = 0 To elem.Length - 1
            r = r + 1: Cells(r, 1) = elem(i).getElementsByTagName("span")(0).innerText
        Next
    End With
End Sub
 
Back
Top