• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User


  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Can't harvest certain fields from a webpage


Active Member
Hi there! Hope you all are fine. Creating my scraper it seemed to me that I did nothing wrong but when i run it i could see that it neither scrapes anything nor throws any error. I have created the scraper to parse phone number only so far because while thinking about creating expression for Weblink n Email I get lost Any help will be highly appreciated. Thanks in advance.

I have tried so far:

Sub AusData()
Dim http As New MSXML2.XMLHTTP60
Dim html As New HTMLDocument
Dim topics As Object, post As HTMLHtmlElement

With http
    .Open "GET", "https://www.truelocal.com.au/business/strata-report-sydney/sydney", False
    html.body.innerHTML = .responseText
End With

Set topics = html.getElementsByClassName("column")
    For Each post In topics
        x = x + 1
        Cells(x, 1) = post.getElementsByClassName("ng-binding ng-scope")(0).innerText
        'Cells(x, 2) = post.getElementsByClassName("")(0)
        'Cells(x, 3) = post.getElementsByClassName("")(0)
    Next post
End Sub

Elements for the fields within:

<div class="column" ng-class="vm.getTabletClass()">
                    <bdp-details-contact-website listing="vm.listing" contacts="vm.listing.contacts" class="ng-isolate-scope"><!-- ngIf: vm.getHavePrimaryWebsite()==true --><a class="iconed-text link-color-white-bck ng-scope" ng-if="vm.getHavePrimaryWebsite()==true" rel="nofollow" ng-click="vm.bdpEventTracking();">
  <span class="icon-holder">
    <i class="icon icon-computer-notebook-1"></i>
  <span class="text-frame" ng-class="(vm.getHaveSecondaryWebsites()==true) ? 'with-aditional-item':''">
    <span ng-click="vm.openLink(vm.getReadableUrl(vm.getPrimaryWebsite()),'_blank')" role="button" tabindex="0">Visit website</span>
</a><!-- end ngIf: vm.getHavePrimaryWebsite()==true --> <!-- iconed-text-->

<!-- ngRepeat: contact in vm.getSecondaryWebsites() --> <!-- iconed-text-->
                    <a href="" class="iconed-text" ng-show="vm.isContactEmail" aria-hidden="false">
                      <span class="icon-holder">
                        <i class="icon icon-email"></i>
                      <span class="text-frame emailBusiness">
                        <span ng-click="vm.emailABusiness($event);" role="button" tabindex="0">Email this business</span>
                    </a> <!-- iconed-text-->
                        <bdp-details-contact-phone contacts="vm.listing.contacts" priority-number="vm.listing.preferences" class="ng-isolate-scope"><!-- ngRepeat: number in vm.getNumbers() --><!-- ngIf: vm.haveNumbers --><span class="iconed-text ng-scope" ng-if="vm.haveNumbers" ng-repeat="number in vm.getNumbers()">
  <span class="icon-holder">
    <!-- ngIf: $index==0 --><i class="icon-phone-call-2 ng-scope" ng-if="$index==0"></i><!-- end ngIf: $index==0 -->
  <span class="text-frame">
    <!-- ngIf: vm.isMobile -->
    <!-- ngIf: !vm.isMobile --><span ng-if="!vm.isMobile" class="ng-binding ng-scope">0421 298 888</span><!-- end ngIf: !vm.isMobile -->
</span><!-- end ngIf: vm.haveNumbers --><!-- end ngRepeat: number in vm.getNumbers() --><!-- ngIf: vm.haveNumbers --><span class="iconed-text ng-scope" ng-if="vm.haveNumbers" ng-repeat="number in vm.getNumbers()">
  <span class="icon-holder">
    <!-- ngIf: $index==0 -->
  <span class="text-frame">
    <!-- ngIf: vm.isMobile -->
    <!-- ngIf: !vm.isMobile --><span ng-if="!vm.isMobile" class="ng-binding ng-scope">0478 151 999</span><!-- end ngIf: !vm.isMobile -->
</span><!-- end ngIf: vm.haveNumbers --><!-- end ngRepeat: number in vm.getNumbers() --> <!-- iconed-text-->
                        <bdp-details-contact-fax contacts="vm.listing.contacts" class="ng-isolate-scope"><!-- ngIf: vm.getHaveFax()==true --> <!-- iconed-text-->
                        <bdp-details-abn-acn listing="vm.listing" class="ng-isolate-scope"><!-- ngIf: vm.haveAbn() -->
<!-- ngIf: vm.haveAcn() --></bdp-details-abn-acn>
I tried to reach the destination page (in this case the link i used above in my crawler) from the main page but that didn't bring any result either. Seriously confused about the thing going on there.

Dim html As New HTMLDocument, htm As New HTMLDocument
Dim topics As Object, topic As Object
Dim post As HTMLHtmlElement, link As HTMLHtmlElement

With http
    .Open "GET", "https://www.truelocal.com.au/search/bar/australia?rbt=%22Pubs+%26+Bars%22&search.distance=2&search.op=AND", False
    html.body.innerHTML = .responseText
End With

Set topics = html.getElementsByClassName("media")
    For Each post In topics
    zz = page & Split(post.getElementsByClassName("name")(0).getElementsByTagName("a")(0).href, ":")(1)
    With http
        .Open "GET", zz, False
        htm.body.innerHTML = .responseText
    End With
    MsgBox http.responseText
        Set topic = htm.getElementsByClassName("column")
        For Each link In topic
            x = x + 1
            Cells(x, 1) = link.getElementsByClassName("ng-binding ng-scope")(0).innerText
        Next link
    Next post
End Sub
Hi ,

If you see , the following line of code is returning nothing :

Set topic = htm.getElementsByClassName("column")

Have you checked whether the page source contains this element ?

Hi Narayan! Thanks for your answer. Nope, nothing in there when i inspect the page source. I got redirected but the item properly gets displayed that is why i was fuddled.
I did it using selenium in combination with VBA. It parses Name and Phone number. Here is the working code.
Sub Testing()
Dim driver As New WebDriver
Dim posts As Object, post As Object

Set driver = New WebDriver
driver.Start "Phantomjs", "https://www.truelocal.com.au/find"
driver.get "/clothing-retailers"
Set posts = driver.FindElementsByXPath("//div[@class='media']")
On Error Resume Next
For Each post In posts
    i = i + 1
    Cells(i, 1) = post.FindElementByXPath(".//span[@class='name']/a").Text
    Cells(i, 2) = post.FindElementByXPath(".//a[contains(@class,'tl-phone-clip')]").Attribute("href")
Next post
End Sub