• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Can't harvest certain fields from a webpage

shahin

Active Member
Hi there! Hope you all are fine. Creating my scraper it seemed to me that I did nothing wrong but when i run it i could see that it neither scrapes anything nor throws any error. I have created the scraper to parse phone number only so far because while thinking about creating expression for Weblink n Email I get lost Any help will be highly appreciated. Thanks in advance.

I have tried so far:

Code:
Sub AusData()
Dim http As New MSXML2.XMLHTTP60
Dim html As New HTMLDocument
Dim topics As Object, post As HTMLHtmlElement

With http
    .Open "GET", "https://www.truelocal.com.au/business/strata-report-sydney/sydney", False
    .send
    html.body.innerHTML = .responseText
End With

Set topics = html.getElementsByClassName("column")
    For Each post In topics
        x = x + 1
        Cells(x, 1) = post.getElementsByClassName("ng-binding ng-scope")(0).innerText
        'Cells(x, 2) = post.getElementsByClassName("")(0)
        'Cells(x, 3) = post.getElementsByClassName("")(0)
    Next post
End Sub

Elements for the fields within:

Code:
<div class="column" ng-class="vm.getTabletClass()">
                    <bdp-details-contact-website listing="vm.listing" contacts="vm.listing.contacts" class="ng-isolate-scope"><!-- ngIf: vm.getHavePrimaryWebsite()==true --><a class="iconed-text link-color-white-bck ng-scope" ng-if="vm.getHavePrimaryWebsite()==true" rel="nofollow" ng-click="vm.bdpEventTracking();">
  <span class="icon-holder">
    <i class="icon icon-computer-notebook-1"></i>
  </span>
  <span class="text-frame" ng-class="(vm.getHaveSecondaryWebsites()==true) ? 'with-aditional-item':''">
    <span ng-click="vm.openLink(vm.getReadableUrl(vm.getPrimaryWebsite()),'_blank')" role="button" tabindex="0">Visit website</span>
  </span>
</a><!-- end ngIf: vm.getHavePrimaryWebsite()==true --> <!-- iconed-text-->

<!-- ngRepeat: contact in vm.getSecondaryWebsites() --> <!-- iconed-text-->
</bdp-details-contact-website>
                    <a href="" class="iconed-text" ng-show="vm.isContactEmail" aria-hidden="false">
                      <span class="icon-holder">
                        <i class="icon icon-email"></i>
                      </span>
                      <span class="text-frame emailBusiness">
                        <span ng-click="vm.emailABusiness($event);" role="button" tabindex="0">Email this business</span>
                      </span>
                    </a> <!-- iconed-text-->
                    <div>
                        <bdp-details-contact-phone contacts="vm.listing.contacts" priority-number="vm.listing.preferences" class="ng-isolate-scope"><!-- ngRepeat: number in vm.getNumbers() --><!-- ngIf: vm.haveNumbers --><span class="iconed-text ng-scope" ng-if="vm.haveNumbers" ng-repeat="number in vm.getNumbers()">
  <span class="icon-holder">
    <!-- ngIf: $index==0 --><i class="icon-phone-call-2 ng-scope" ng-if="$index==0"></i><!-- end ngIf: $index==0 -->
  </span>
  <span class="text-frame">
    <!-- ngIf: vm.isMobile -->
    <!-- ngIf: !vm.isMobile --><span ng-if="!vm.isMobile" class="ng-binding ng-scope">0421 298 888</span><!-- end ngIf: !vm.isMobile -->
  </span>
</span><!-- end ngIf: vm.haveNumbers --><!-- end ngRepeat: number in vm.getNumbers() --><!-- ngIf: vm.haveNumbers --><span class="iconed-text ng-scope" ng-if="vm.haveNumbers" ng-repeat="number in vm.getNumbers()">
  <span class="icon-holder">
    <!-- ngIf: $index==0 -->
  </span>
  <span class="text-frame">
    <!-- ngIf: vm.isMobile -->
    <!-- ngIf: !vm.isMobile --><span ng-if="!vm.isMobile" class="ng-binding ng-scope">0478 151 999</span><!-- end ngIf: !vm.isMobile -->
  </span>
</span><!-- end ngIf: vm.haveNumbers --><!-- end ngRepeat: number in vm.getNumbers() --> <!-- iconed-text-->
</bdp-details-contact-phone>
                    </div>
                    <div>
                        <bdp-details-contact-fax contacts="vm.listing.contacts" class="ng-isolate-scope"><!-- ngIf: vm.getHaveFax()==true --> <!-- iconed-text-->
</bdp-details-contact-fax>
                    </div>
                    <div>
                        <bdp-details-abn-acn listing="vm.listing" class="ng-isolate-scope"><!-- ngIf: vm.haveAbn() -->
<!-- ngIf: vm.haveAcn() --></bdp-details-abn-acn>
                    </div>
                </div>
 
I tried to reach the destination page (in this case the link i used above in my crawler) from the main page but that didn't bring any result either. Seriously confused about the thing going on there.

Code:
Dim html As New HTMLDocument, htm As New HTMLDocument
Dim topics As Object, topic As Object
Dim post As HTMLHtmlElement, link As HTMLHtmlElement

With http
    .Open "GET", "https://www.truelocal.com.au/search/bar/australia?rbt=%22Pubs+%26+Bars%22&search.distance=2&search.op=AND", False
    .send
    html.body.innerHTML = .responseText
End With

Set topics = html.getElementsByClassName("media")
    For Each post In topics
   
    zz = page & Split(post.getElementsByClassName("name")(0).getElementsByTagName("a")(0).href, ":")(1)
   
    With http
        .Open "GET", zz, False
        .send
        htm.body.innerHTML = .responseText
    End With
    MsgBox http.responseText
        Set topic = htm.getElementsByClassName("column")
        For Each link In topic
            x = x + 1
            Cells(x, 1) = link.getElementsByClassName("ng-binding ng-scope")(0).innerText
        Next link
    Next post
End Sub
 
Hi ,

If you see , the following line of code is returning nothing :

Set topic = htm.getElementsByClassName("column")

Have you checked whether the page source contains this element ?

Narayan
 
Hi Narayan! Thanks for your answer. Nope, nothing in there when i inspect the page source. I got redirected but the item properly gets displayed that is why i was fuddled.
 
I did it using selenium in combination with VBA. It parses Name and Phone number. Here is the working code.
Code:
Sub Testing()
Dim driver As New WebDriver
Dim posts As Object, post As Object

Set driver = New WebDriver
driver.Start "Phantomjs", "https://www.truelocal.com.au/find"
driver.get "/clothing-retailers"
Set posts = driver.FindElementsByXPath("//div[@class='media']")
On Error Resume Next
For Each post In posts
    i = i + 1
    Cells(i, 1) = post.FindElementByXPath(".//span[@class='name']/a").Text
    Cells(i, 2) = post.FindElementByXPath(".//a[contains(@class,'tl-phone-clip')]").Attribute("href")
Next post
End Sub
 
Back
Top