• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How can I scrape the title of products from a webpage encrypted within javascript?

shahin

Active Member
Hi there all! Creating a scraper for the purpose of parsing the title of different products from a webpage when I run it, I get nothing. What I could see that the desired contents are within javascript element. As I don't wanna use any converter to harvest those titles, I expect somebody to help me get that using dictionary. What I tried initially was this:

Code:
Sub RmartData()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim titelem As Object, title As Object, protitle As Object

With http
    .Open "GET", "https://redmart.com/bakery", False
    .send
    html.body.innerHTML = .responseText
End With

Set titelem = html.getElementsByClassName("description")
For Each title In titelem
Set protitle = title.getElementsByTagName("a")
     x = x + 1
    Cells(x, 1) = protitle(0).innerText
 Next title
End Sub
 
Using chrome developer tools, clicking the response button what i could see is in the text file below:
 

Attachments

  • Json stuff.txt
    12 KB · Views: 9
There is always a workaround if external libraries are taken into consideration, as in selenium. To run the code one has to tick/check the "selenium type library" in the VBE/tools/reference. Here is the working code that parses Name and Price from that javascript encrypted certain site.
Code:
Sub Testing()
Dim driver As New WebDriver
Dim posts As Object, post As Object

Set driver = New WebDriver
driver.Start "Phantomjs", "https://redmart.com"
driver.get "/bakery"
Set posts = driver.FindElementsByClass("productDescriptionAndPrice")
On Error Resume Next
For Each post In posts
    i = i + 1
    Cells(i, 1) = post.FindElementByTag("h4").Text
    Cells(i, 2) = post.FindElementByClass("ProductPrice__price___3BmxE").Text
Next post
End Sub

Btw, If Marc L sees this, he is not gonna like it. He is the programmer of clean coding and likes to stick to the mainstream libraries of vba and doesn't like the usage of "On error resume next" either within the code. To him, apology in advance.
 
Last edited:
Using xpath to serve the same purpose. It scrapes Name, Price and Link.
Code:
Sub Testing()
Dim driver As New WebDriver
Dim posts As Object, post As Object

Set driver = New WebDriver
driver.Start "Phantomjs", "https://redmart.com"
driver.get "/bakery"
Set posts = driver.FindElementsByXPath("//div[@class='productDescriptionAndPrice']")
On Error Resume Next
For Each post In posts
    i = i + 1
    Cells(i, 1) = post.FindElementByXPath(".//h4/a").Text
    Cells(i, 2) = post.FindElementByXPath(".//span[@class='ProductPrice__price___3BmxE']").Text
    Cells(i, 3) = post.FindElementByXPath(".//h4/a").Attribute("href")
Next post
End Sub
 
Last edited:
Using string manipulation and at the same time avoid using third party libs. It is smoothly parsing Title, sku, price and description from Redmart:
Code:
Sub Redmart_data()
Const URL = "https://api.redmart.com/v1.5.8/catalog/search?extent=2&pageSize=6&sort=1&category=bakery"
Dim http As New XMLHTTP60
Dim str As Variant

With http
    .Open "GET", URL, False
    .send
    str = Split(.responseText, "category_tags"":")
End With
On Error Resume Next
y = UBound(str)

    For i = 1 To y
        Cells(i, 1) = Split(Split(str(i), "title"":""")(1), """")(0)
        Cells(i, 2) = Split(Split(str(i), "sku"":""")(1), """")(0)
        Cells(i, 3) = Split(Split(str(i), "price"":")(1), ",")(0)
        Cells(i, 4) = Split(Split(str(i), "desc"":""")(1), """")(0)
    Next i
End Sub
 
Back
Top