• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Fail to scrape the docs of every link rather than a single one repeatedly

shahin

Active Member
Const pageurl As String = "http://www.slg.ch"

Sub ScrapingPro()
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, gist As Object
Dim x As String, y As String
Dim b As Long
Dim vids As Object, vid As Object

Range("A1").Select

http.Open "GET", "http://www.slg.ch/de/branchenverzeichnis/liste", False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing

Set topics = html.getElementsByClassName("address_row")
For b = 0 To topics.Length - 1

If Not topics(b) Is Nothing Then

Set gist = topics(b).getElementsByTagName("a")(0)

x = gist.getAttribute("href")
y = pageurl & Mid(x, InStr(x, ":") + 1)

http.Open "GET", y, False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing

End If

Set vids = html.getElementsByClassName("detail_row_right_cell")
For Each vid In vids
ActiveCell.Value = vid.innerText
ActiveCell.Offset(1, 0).Select
Next vid
Next b
End Sub

I would like to scrape every docs within the newly produced links. It produces 20 links but when i run the code it gives me the docs of the first link and continue until 20 times, that means it repeats the action for a single link not for the 20 links. The problem is between the [If-------end if block] i suppose. Any help would greatly be appreciated.
 
You should use different "HTMLDocument" variable for below section.
Code:
If Not topics(b) Is Nothing Then

Set gist = topics(b).getElementsByTagName("a")(0)

x = gist.getAttribute("href")
y = pageurl & Mid(x, InStr(x, ":") + 1)

http.Open "GET", y, False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing

End If

Since you are reusing "html" you are overwriting what you got in below section of your code.
Code:
http.Open "GET", "http://www.slg.ch/de/branchenverzeichnis/liste", False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing
 
You should use different "HTMLDocument" variable for below section.
Code:
If Not topics(b) Is Nothing Then

Set gist = topics(b).getElementsByTagName("a")(0)

x = gist.getAttribute("href")
y = pageurl & Mid(x, InStr(x, ":") + 1)

http.Open "GET", y, False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing

End If

Since you are reusing "html" you are overwriting what you got in below section of your code.
Code:
http.Open "GET", "http://www.slg.ch/de/branchenverzeichnis/liste", False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing




You are the genius, man!!!!!!! Hats off to you. You saved my day. Thanks a zillion.
 
Back
Top