shahin
Active Member
I've written a scraper in VBA in combination with selenium to parse product names and prices from a javascript enabled webpage. The thing is that I just used selenium to get the page source (as it was not possible to get the response text without opening the page with selenium) and as soon as I get it I reverted back to usual VBA method to complete the operation.
Most importantly, it was hard to deal with parsing prices cause there are two types of class names for prices so if a particular class name is used in the script then it is noticed that, after scraping is done, some fields are blank in the price section in the spreadsheet.
However, I have handled the two class names effectively in my parser to get prices flawlessly. Now, it is working great and way faster than selenium alone.
Here is what I've written:
Btw, libraries to add in the reference library:
Most importantly, it was hard to deal with parsing prices cause there are two types of class names for prices so if a particular class name is used in the script then it is noticed that, after scraping is done, some fields are blank in the price section in the spreadsheet.
However, I have handled the two class names effectively in my parser to get prices flawlessly. Now, it is working great and way faster than selenium alone.
Here is what I've written:
Code:
Sub RedmartScraper()
Dim driver As New ChromeDriver, html As New HTMLDocument
Dim post As HTMLHtmlElement
With driver
.Get "https://redmart.com/bakery"
html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
.Quit
End With
For Each post In html.getElementsByClassName("productDescriptionAndPrice")
With post.getElementsByTagName("h4")(0).getElementsByTagName("a")
If .Length Then i = i + 1: Cells(i, 1) = .item(0).innerText
End With
With post.getElementsByClassName("ProductPrice__promo_price___3OWY9")
If .Length Then Cells(i, 2) = .item(0).innerText
End With
With post.getElementsByClassName("ProductPrice__price___3BmxE")
If .Length Then Cells(i, 2) = .item(0).innerText
End With
Next post
End Sub
Btw, libraries to add in the reference library:
- Microsoft Html Object Library
- Selenium Type Library