• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Web scraping code help

Hi All,

I have following code it:

1. open a webpage (amazon in this case)
2. click on all the products appearing on the page ( and open each product in new tab)

Can you help me to update the code to include a loop:

3. go through each open tab (from step 2) and
4. copy the product title to column A of active sheet (64127

I have included the screenshot of how the result will look like.
Code:
Sub launch_amazon()


Dim IE As SHDocVw.InternetExplorer
Set IE = New SHDocVw.InternetExplorer

IE.Visible = True
IE.Navigate "https://www.amazon.in/s?k=rudra+fashion&rh=p_n_size_two_browse-vebin%3A1975333031&dc&crid=2AKWK100N33Q9&qid=1574534623&rnid=1974754031&sprefix=rudra+fas%2Caps%2C287&ref=sr_nr_p_n_size_two_browse-vebin_8"


Do While IE.ReadyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading"

Loop

Dim idoc As mshtml.HTMLDocument
Set idoc = IE.Document



Dim doc_ele As mshtml.IHTMLElement

Dim doc_eles As mshtml.IHTMLElementCollection
Set doc_eles = idoc.getElementsByTagName("img")


For Each doc_ele In doc_eles

If doc_ele.className = "s-image" Then
    doc_ele.Click
    End If

    Next doc_ele

' loop required here

End Sub
 

Attachments

  • excel output.png
    excel output.png
    14.8 KB · Views: 4
Last edited:
Hi ,

Try this :
Code:
Sub launch_amazon()
    Dim IE As SHDocVw.InternetExplorer
    Dim idoc As MSHTML.HTMLDocument
    Dim doc_ele As MSHTML.IHTMLElement
    Dim doc_eles As MSHTML.IHTMLElementCollection

    Dim startoftitle As Integer, endoftitle As Integer, rownum As Long
    Dim vouterHTML As String, ProductTitle As String
    
    Set IE = New SHDocVw.InternetExplorer
    IE.Visible = True
    IE.Navigate "https://www.amazon.in/s?k=rudra+fashion&rh=p_n_size_two_browse-vebin%3A1975333031&dc&crid=2AKWK100N33Q9&qid=1574534623&rnid=1974754031&sprefix=rudra+fas%2Caps%2C287&ref=sr_nr_p_n_size_two_browse-vebin_8"

    Do While IE.ReadyState <> READYSTATE_COMPLETE
       Application.StatusBar = "Loading"
    Loop
    Set idoc = IE.Document

    Set doc_eles = idoc.getElementsByTagName("img")
    rownum = 1
    
    For Each doc_ele In doc_eles
        If doc_ele.className = "s-image" Then
           doc_ele.Click
          
           vouterHTML = doc_ele.outerHTML
           startoftitle = InStr(1, vouterHTML, "alt=") + 5
           endoftitle = InStr(startoftitle, vouterHTML, """") - 1
           ProductTitle = Mid(vouterHTML, startoftitle, endoftitle - startoftitle + 1)
           ActiveSheet.Cells(rownum, 1).Value = ProductTitle
           rownum = rownum + 1
        End If
    Next doc_ele
    
    ActiveSheet.Columns(1).EntireColumn.AutoFit
    IE.Quit
End Sub

Narayan
 
Thanks Narayan. It's excellent.

Somehow above code pulls 9 product titles where if you paste the url in the code it only shows 4 products

Why it may be doing it ?
 
Hi ,

I am not able to get 9 entries !

See the attached file ; I copied the code to a sheet module and ran it from Sheet2 ; only 4 entries.

I added a new sheet Sheet3 and ran the code from the code module ; only 4 entries.

Narayan
 

Attachments

  • Sample Test.xlsm
    19.8 KB · Views: 2
Super thank you !

I know I am being greedy here but how can I add the price for same entries in column B. HTML element of the price is "<span class="a-price-whole">649</span>"
 
Hi ,

I am not sure I can do it , because the website seems to not have a standard way of approaching each item that it is displaying.

The order in which the items are displayed in the browser does not match the order in which the items appear in the worksheet.

Let me see if I can do anything ; in the meantime if anyone else wants to pitch in , please feel free to do so.

Narayan
 
Thanks Narayan. Someone suggested following code that scrap the info from the main page without opening each individual product and it gets the job done.

The purpose of going into individual product pages was to enable to get some more detailed info on individual products.
Is there a way to tweak the following code to achieve it e.g. to scrap ASIN number of each product?


Code:
Sub Amaz_Pro()
    'VBE>Tools>References> Microsoft HTML Object Library
    Dim html As MSHTML.HTMLDocument

    Set html = New MSHTML.HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.amazon.in/s?k=rudra+fashion&rh=p_n_size_two_browse-vebin%3A1975333031&dc&crid=2AKWK100N33Q9&qid=1574534623&rnid=1974754031&sprefix=rudra+fas%2Caps%2C287&ref=sr_nr_p_n_size_two_browse-vebin_8", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        html.body.innerHTML = .responseText
    End With

    Dim headers(), titles As Object, prices As Object
    headers = Array("Title", "Price", "Img url")

    With html
        Set titles = .querySelectorAll(".s-image")
        Set prices = .querySelectorAll(".a-price-whole,.a-color-price")
    End With

    Dim results(), r As Long, priceInfo As String

    ReDim results(1 To titles.Length, 1 To UBound(headers) + 1)

    For r = 0 To titles.Length - 1
        results(r + 1, 1) = titles.Item(r).alt
        results(r + 1, 2) = Replace$(prices.Item(r).innerText, ChrW(8377), vbNullString)
        results(r + 1, 3) = titles.Item(r).src
        ' results(r + 1, 4) = titles.Item(r).h5
    Next

    Dim ws As Worksheet

    Set ws = ThisWorkbook.Worksheets("Sheet1")

    With ws
        .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
    End With
End Sub
 
Hi ,

Does the code work with the second link you posted ?

Can you ensure it works and then post the working code ?

If you see my code , I am not using the Click method to click on any image.

Narayan
 
Hi ,

When I run the code in the workbook you uploaded , it aborts on Runtime error 9. Subscript out of range

The offending line of code is :
Code:
ReDim results(1 To titles.Length, 1 To UBound(headers) + 1)
and when I check through Debug , titles.Length shows 0.

I am sorry but there is nothing much I can do.

Narayan
 
Hi !​
Someone suggested following code that scrap the info from the main page without opening each individual product and it gets the job done.
Is there a way to tweak the following code to achieve it e.g. to scrap ASIN number of each product?
Yes again - as this is a duplicate thread ! - directly from the main webpage where ASIN yet exists​
like you can see using the webbrowser inspector tool …​
When creating a duplicate thread if at least the initial post contains all you need and not only a part !​
 
Didn't you ever notice ASIN is yet in each product page URL ?! So two ways to grabb it from the main webpage !​
As scrapping data is nothing but first well observing & reading …​
 
Back
Top