• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Cleaning awkwardly written code to parse required data

shahin

Active Member
Hi there! Good day. I've made a scraper which is parsing data as I wanted it to. However, the problem is that the code I've written is looking damn messy cause i had to overwrite my code to get the data as in some elements data were not present rather it was the next element where I found it. There must be any easy way or well organized code that should be replaced with what I've written. Thanks in advance. Here I'm pasting what I've written and attaching an excel file to show that it is working nicely.

Code:
Sub houzzData()
Const url = "http://www.houzz.com/pro/brickmoondesign/brickmoon-design"
Dim html As New HTMLDocument, cel As Range
Dim topics As Object, posts As Object, data As HTMLHtmlElement
x = 2
For Each cel In Range("A2:A7")
    With CreateObject("MSXML2.serverXMLHTTP")
        .Open "GET", cel, False
        .send
        html.body.innerHTML = .responseText
    End With
Set topics = html.getElementsByClassName("container profile-carded")
    On Error Resume Next
    For i = 0 To topics.Length - 1
    Set data = topics(i)
    Cells(x, 2) = data.getElementsByClassName("profile-full-name")(0).innerText
    Cells(x, 3) = Replace(data.getElementsByClassName("info-list-text")(1).innerText, "Contact: ", "")
    Cells(x, 4) = data.getElementsByClassName("info-list-text")(1).getElementsByTagName("span")(0).innerText
    Cells(x, 5) = data.getElementsByClassName("info-list-text")(1).getElementsByTagName("span")(1).innerText
    Cells(x, 6) = data.getElementsByClassName("info-list-text")(1).getElementsByTagName("span")(2).innerText
    Cells(x, 7) = data.getElementsByClassName("info-list-text")(1).getElementsByTagName("span")(3).innerText
    Cells(x, 8) = data.getElementsByClassName("info-list-text")(1).getElementsByTagName("span")(4).innerText
    Cells(x, 9) = data.getElementsByClassName("info-list-text")(1).getElementsByTagName("span")(5).innerText
    Cells(x, 4) = data.getElementsByClassName("info-list-text")(2).getElementsByTagName("span")(0).innerText
    Cells(x, 5) = data.getElementsByClassName("info-list-text")(2).getElementsByTagName("span")(1).innerText
    Cells(x, 6) = data.getElementsByClassName("info-list-text")(2).getElementsByTagName("span")(2).innerText
    Cells(x, 7) = data.getElementsByClassName("info-list-text")(2).getElementsByTagName("span")(3).innerText
    Cells(x, 8) = data.getElementsByClassName("info-list-text")(2).getElementsByTagName("span")(4).innerText
    Cells(x, 9) = data.getElementsByClassName("info-list-text")(2).getElementsByTagName("span")(5).innerText
    Cells(x, 10) = data.getElementsByClassName("pro-contact-text")(0).innerText
    Cells(x, 11) = data.getElementsByClassName("proWebsiteLink")(0).href
    x = x + 1
    Next i
Next cel
End Sub
 

Attachments

  • Awkwardly Written.xlsm
    22.3 KB · Views: 4
Last edited:
Hi !

This code does not work on my side : access denied on first send …

For codelines with same parent element you can use With statement
and to reach an element,
for example the first one use .Item(0).innerText
 
Hi Marc L, it's always a great pleasure to have you in contact. I deleted my previously uploaded file and attached a new one for your consideration with "Different Http Request" and fewer data to examine what is going on. I am gonna go through the process you just suggested but the thing is that most of the time you do things the way it is difficult for a novice person like me to keep pace. Thanks.
 
Well done for thinking about changing ActiveX request !

You must control if a collection contains any element
and - again ! - clear object variables at end of process :​
Code:
Dim oElt As HTMLHtmlElement

Function GetClassElementText$(Class$, Optional Num&, Optional Pty$ = "innerText")
    With oElt.getElementsByClassName(Class)
     If .Length > 0 And Num < .Length Then GetClassElementText = CallByName(.Item(Num), Pty, VbGet)
    End With
End Function

Sub Demo()
        Dim R&, C&, V, oReq As New ServerXMLHTTP60, oDoc As New HTMLDocument
            R = 1
            C = Cells(Rows.Count, 1).End(xlUp).Row
         If C = 1 Then Beep: Exit Sub
              Range("B2:K" & C).ClearContents
For Each V In Range("A2:A" & C).Value
        oReq.Open "GET", V, False
        oReq.setRequestHeader "DNT", "1"
        oReq.send
        oDoc.body.innerHTML = oReq.responseText
    For Each oElt In oDoc.getElementsByClassName("container profile-carded")
             R = R + 1
             Cells(R, 2).Value = GetClassElementText("profile-full-name")
        With oElt.getElementsByClassName("info-list-text")
            If .Length > 1 Then
                       Cells(R, 3).Value = Replace(.Item(1).innerText, "Contact: ", "")
                With .Item(Application.Min(2, .Length - 1)).getElementsByTagName("SPAN")
                  If .Length Then
                                ReDim VA(.Length - 1)
                            For C = 0 To .Length - 1:     VA(C) = .Item(C).innerText:     Next
                    Cells(R, 4).Resize(, .Length).Value = VA
                  End If
                End With
            End If
        End With
            Cells(R, 10).Value = GetClassElementText("pro-contact-text")
            Cells(R, 11).Value = GetClassElementText("proWebsiteLink", , "href")
    Next
Next
                   Set oReq = Nothing:          Set oDoc = Nothing
End Sub
You may Like it !
 
This is exactly what I wanted the scraper should behave. Although I didn't count, it seemed to me that the code you provided when i run was scraping faster .Btw, I am gonna study your code now. There are several things to learn out of it. Thanks a zillion.
 
Yes my code may be a bit faster 'cause objects are created only once
instead of your code where ActiveX request object is brand new created
at each loop iteration as well useless object variable like topics !

As creating an object or allocating it to a variable waste time …
 
Back
Top