• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Creating a dynamic parser capable of collecting data from a javascript enabled webpage

shahin

Active Member
I've written a scraper in VBA in combination with selenium to parse product names and prices from a javascript enabled webpage. The thing is that I just used selenium to get the page source (as it was not possible to get the response text without opening the page with selenium) and as soon as I get it I reverted back to usual VBA method to complete the operation.

Most importantly, it was hard to deal with parsing prices cause there are two types of class names for prices so if a particular class name is used in the script then it is noticed that, after scraping is done, some fields are blank in the price section in the spreadsheet.

However, I have handled the two class names effectively in my parser to get prices flawlessly. Now, it is working great and way faster than selenium alone.

Here is what I've written:
Code:
Sub RedmartScraper()
  Dim driver As New ChromeDriver, html As New HTMLDocument
  Dim post As HTMLHtmlElement

  With driver
    .Get "https://redmart.com/bakery"
    html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
    .Quit
  End With

  For Each post In html.getElementsByClassName("productDescriptionAndPrice")
    With post.getElementsByTagName("h4")(0).getElementsByTagName("a")
      If .Length Then i = i + 1: Cells(i, 1) = .item(0).innerText
    End With

    With post.getElementsByClassName("ProductPrice__promo_price___3OWY9")
        If .Length Then Cells(i, 2) = .item(0).innerText
    End With

    With post.getElementsByClassName("ProductPrice__price___3BmxE")
        If .Length Then Cells(i, 2) = .item(0).innerText
    End With
  Next post
End Sub

Btw, libraries to add in the reference library:
  1. Microsoft Html Object Library
  2. Selenium Type Library
 
Back
Top