• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Recursive web data harvesting using vba

shahin

Active Member
Hi there all! I have tried to make a crawler which is at this moment crawling a webpage recursively. However, I have got two problems with this.
1. Scraped data are getting overwritten in a single row but i cant make it go down as normal loop does.
2. Is there any way to set the depth of this crawler so that i can avoid infinite crawling?

Here is what I'm up to:

Code:
Sub Candy_Crush(Z As String)
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim Items As Object, Item As Object, Newitem As Object
Dim elem As Object, athing As Object, bthing As Object

With http
    .Open "GET", Z, False
    .send
    html.body.innerHTML = .responseText
End With

Set Newitem = html.getElementsByClassName("name")
Set Items = html.getElementsByClassName("left")

    For Each Item In Items
        Set athing = Item.getElementsByTagName("h1")
        Set bthing = Item.getElementsByTagName("h2")
  
        x = x + 1
        If athing.Length Then Cells(x, 2) = athing(0).innerText
        If bthing.Length Then Cells(x, 3) = bthing(0).innerText
    Next Item

    For Each elem In Newitem
            x = x + 1
            Cells(x, 1) = elem.href
            Candy_Crush (elem.href)
    Next elem
End Sub

Sub RecursiveCrawler()
Candy_Crush ("https://itunes.apple.com/us/app/toy-blast/id890378044?mt=8")
End Sub
 
Hi !
1 : use a row global variable
2 : use a depth level as request argument …​
Code:
    Dim oReq As New XMLHTTP60, oDoc As New HTMLDocument, R&

Sub Request(URL$, Optional LEVEL% = 1)
        Dim oElt As Object, N&
        oReq.Open "GET", URL, False
        oReq.send
        oDoc.body.innerHTML = oReq.responseText
        R = R + 1
        Cells(R, 1).Value = LEVEL
    For Each oElt In oDoc.getElementsByClassName("left")
        With oElt.getElementsByTagName("h1")
            If .Length Then Cells(R, 2).Value = .Item(0).innerText
        End With
        With oElt.getElementsByTagName("h2")
            If .Length Then Cells(R, 3).Value = .Item(0).innerText
        End With
    Next
    With oDoc.getElementsByClassName("name")
        For N = 0 To .Length - 1
            R = R + 1
            Cells(R, 1).Value = .Item(N).href
        Next
        If LEVEL Then For N = 0 To .Length - 1: Request .Item(N).href, LEVEL - 1: Next
    End With
End Sub

Sub Main()
    Request "https://itunes.apple.com/us/app/toy-blast/id890378044?mt=8"
    Set oReq = Nothing:  Set oDoc = Nothing:  R = 0
End Sub
Do you like it ? So thanks to click on bottom right Like !
 
You left me dumbfounded with this magical code. Thanks, Marc L. You are just awesomeeeeeeeeeeeeeeeeeeeeeee.:):DD
 
@Marc L,
Every time you give a demo on anything, I learn a lot of new things from it. What you did above is totally out of the box. Limiting the usage of object. You did the whole thing only applying a single object "oElt" other than the built-in ones. That is mindblowing. I need to know something more about this "Item" property. Is it vba's built-in property? And the last thing I wanna have is that you gave several demos on "Scripting Dictionary" out there but I can't understand where to start with so if you give me a link on this, it would be very helpful. Thanks again.
 

Item VBA property refers to an element of a collection whatever
if this collection is a VBA inner object or from an external object …

You can directly start for Dictionary easy object from VBA inner help
(even if it is not a VBA object or see on MSDN) or from any web tutorial
like Data Dictionary in VBA - Complete Syntax Documentation
It works like a VBA Collection but with more methods and properties.
 
One thing I forgot to ask. How can I write this below expression in one liner code? I need this to avoid writing headers manually.

Cells(X,1)="Max"
Cell(X,2)="Average"
Cells(X,3)="Min"
 
Last edited:
Thanks Marc L, whatever I have learnt from you today is totally unique to me. very happy now.
 
Back
Top