Fail to scrape the docs of every link rather than a single one repeatedly

shahin · Jan 6, 2017

Const pageurl As String = "http://www.slg.ch"

Sub ScrapingPro()
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, gist As Object
Dim x As String, y As String
Dim b As Long
Dim vids As Object, vid As Object

Range("A1").Select

http.Open "GET", "http://www.slg.ch/de/branchenverzeichnis/liste", False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing

Set topics = html.getElementsByClassName("address_row")
For b = 0 To topics.Length - 1

If Not topics(b) Is Nothing Then

Set gist = topics(b).getElementsByTagName("a")(0)

x = gist.getAttribute("href")
y = pageurl & Mid(x, InStr(x, ":") + 1)

http.Open "GET", y, False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing

End If

Set vids = html.getElementsByClassName("detail_row_right_cell")
For Each vid In vids
ActiveCell.Value = vid.innerText
ActiveCell.Offset(1, 0).Select
Next vid
Next b
End Sub

I would like to scrape every docs within the newly produced links. It produces 20 links but when i run the code it gives me the docs of the first link and continue until 20 times, that means it repeats the action for a single link not for the 20 links. The problem is between the [If-------end if block] i suppose. Any help would greatly be appreciated.

Chihiro · Jan 6, 2017

You should use different "HTMLDocument" variable for below section.

Code:

If Not topics(b) Is Nothing Then

Set gist = topics(b).getElementsByTagName("a")(0)

x = gist.getAttribute("href")
y = pageurl & Mid(x, InStr(x, ":") + 1)

http.Open "GET", y, False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing

End If

Since you are reusing "html" you are overwriting what you got in below section of your code.

Code:

http.Open "GET", "http://www.slg.ch/de/branchenverzeichnis/liste", False
http.send
html.body.innerHTML = http.responseText
Set http = Nothing

shahin · Jan 6, 2017

Chihiro said:
You should use different "HTMLDocument" variable for below section.

Code:

If Not topics(b) Is Nothing Then Set gist = topics(b).getElementsByTagName("a")(0) x = gist.getAttribute("href") y = pageurl & Mid(x, InStr(x, ":") + 1) http.Open "GET", y, False http.send html.body.innerHTML = http.responseText Set http = Nothing End If

Since you are reusing "html" you are overwriting what you got in below section of your code.

Code:

http.Open "GET", "http://www.slg.ch/de/branchenverzeichnis/liste", False http.send html.body.innerHTML = http.responseText Set http = Nothing

You are the genius, man!!!!!!! Hats off to you. You saved my day. Thanks a zillion.

Fail to scrape the docs of every link rather than a single one repeatedly

shahin

Active Member

Chihiro

Excel Ninja

shahin

Active Member