• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

VBA - web scraping same class but different inner text

Hi Guys - I am using following code for web scraping. It provide me with Image URL in column A & price in Column B. There are either two prices (original & discounted) or original price only for some products. The class name for both original and discounted price is the same. The code bring both prices and I require only discounted price in column B.

HTML for discounted price:

<span id="product-price-64534" data-price-amount="399" data-price-type="finalPrice" class="price-wrapper "><span class="price">₹ 399</span></span>

HTML for original price:

<span id="old-price-64534" data-price-amount="798" data-price-type="oldPrice" class="price-wrapper "><span class="price">₹ 798</span></span>

I have attached a sample workbook and reflected desired outcome in column C highlighted in yellow.

Any help?

Code:
Sub LB1()
    Dim ie As SHDocVw.InternetExplorer
    Dim idoc As MSHTML.HTMLDocument
    Dim doc_ele As MSHTML.IHTMLElement
    Dim doc_eles As MSHTML.IHTMLElementCollection

    Dim startoftitle As Integer, endoftitle As Integer, rownum As Variant
    Dim Prodtitle As String, Price As String, OrigPrice As String, ImgUrl As String
   
    Set ie = New SHDocVw.InternetExplorer
    ie.Visible = True
    ie.navigate "https://www.leemboodi.com/catalogsearch/result/?q=1037s50"

    Do While ie.readyState <> READYSTATE_COMPLETE
       Application.StatusBar = "Loading"
    Loop
    Set idoc = ie.document


'image URL

Set doc_eles3 = idoc.getElementsByClassName("product-image-photo default_image ")
    Z = 1
   
     For Each doc_ele3 In doc_eles3
        If doc_ele3.className = "product-image-photo default_image " Then
           'doc_ele.Click
         
           ImgUrl = doc_ele3.src
       
         
           ActiveSheet.Cells(Z, 1).Value = ImgUrl
           Z = Z + 1
         
     End If
     Next doc_ele3


' price

    Set doc_eles = idoc.getElementsByClassName("price")
    rownum = 1
   
    For Each doc_ele In doc_eles
        If doc_ele.className = ("price") Then
         
         
           Prodtitle = doc_ele.innerText
         
           ActiveSheet.Cells(rownum, 2).Value = Prodtitle
           rownum = rownum + 1
         
        End If
    Next doc_ele
   
 
   
   
    ActiveSheet.Columns(1).EntireColumn.AutoFit
    ie.Quit
End Sub
 

p45cal

Well-Known Member
I don't think it's very robust but try:
Code:
…
For Each doc_ele In doc_eles
  If doc_ele.className = ("price") Then
    If InStr(doc_ele.parentElement.ID, "product-price") > 0 Then
      Prodtitle = doc_ele.innerText
      ActiveSheet.Cells(rownum, 2).Value = Prodtitle
      rownum = rownum + 1
    End If
  End If
Next doc_ele
…
 

shahin

Active Member
Try the following to get the expected output accordingly. I used xhr instead of IE to make it robust. Make sure there is already a sheet named "output" in your excel workbook in which you are gonna use this macro.

Code:
Public Sub GetInfo()
    Const Url = "https://www.leemboodi.com/catalogsearch/result/?q=1037s50"
    Dim Html As New HTMLDocument, Htmldoc As New HTMLDocument
    Dim Wb As Workbook, ws As Worksheet, R&, I&

    Set Wb = ThisWorkbook
    Set ws = Wb.Worksheets("output")

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", Url, False
        .send
        Html.body.innerHTML = .responseText
    End With

    With Html.querySelectorAll(".product-item-info")
        For I = 0 To .Length - 1
            Htmldoc.body.innerHTML = .Item(I).outerHTML
            R = R + 1: ws.Cells(R, 1) = Htmldoc.querySelector("img.product-image-photo").getAttribute("src")
            ws.Cells(R, 2) = Htmldoc.querySelector("[id*='product-price-'] > .price").innerText
        Next I
    End With
End Sub
 
I don't think it's very robust but try:
Code:
…
For Each doc_ele In doc_eles
  If doc_ele.className = ("price") Then
    If InStr(doc_ele.parentElement.ID, "product-price") > 0 Then
      Prodtitle = doc_ele.innerText
      ActiveSheet.Cells(rownum, 2).Value = Prodtitle
      rownum = rownum + 1
    End If
  End If
Next doc_ele
…
Many thanks ! it worked
 
One more thing, how can I include pagination in the code to loop "Show more" button. This website has "Show more" button instead of pages that I need to enter multiple times until the product line is finished. The HTML element of the "Show more" is as follows:

<a href="javascript:void(0)" class="btn-load-more">Show More</a>
 

p45cal

Well-Known Member
This is not my thing.
Try:
Code:
Set MoreBtns = idoc.getElementsByClassName("btn-load-more")
Set MoreBtn = MoreBtns(0)
MoreBtn.Click
or shorter:
Code:
idoc.getElementsByClassName("btn-load-more")(0).Click
but I doubt this is the way to go about it.
I don't know how to detect when there's no more to load (I don't know how to detect whether there's an event attached to the button).
 
Can you help me to loop the following piece of code 11 times?

Code:
Set doc_eles = idoc.getElementsByTagName("a")

    rownum = 1

   For Each doc_ele In doc_eles
        If doc_ele.className = ("btn-load-more") Then
           doc_ele.Click
           End If

           Next doc_ele
 
Last edited:

p45cal

Well-Known Member
I said this is not my thing - I don't know what you're doing with that snippet. All I can do is guide you with how to loop 11 times in vba:
Code:
For k = 1 to 11
    'do something eg.:
    MsgBox "Hi there!"
Next k
 
Top