1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract product-price from a URL in Excel VB

Discussion in 'VBA Macros' started by dudester, Sep 16, 2018.

  1. dudester

    dudester New Member

    Messages:
    9
    Hi,

    I am new to web scraping with Excel, I want to extract the product price to a variable from a url, the snippet of relevant HTML is below, the line <span class="price">$120.95 AUD</span> is the value I want to extract but I cant get my coding right to pick it up, what is the best way to do this?



    Thank you




    <div class="product-price">

    <span class="product-price__price product-price__sale">
    <span class="price">$120.95 AUD</span>
    <span class="product-price__sale-label"><span class="brackets skeleton-loading" data-trans-key="products.product.on_sale">Sale</span></span>
    </span>

    <span class="visually-hidden"><span class="brackets skeleton-loading" data-trans-key="products.product.regular_price">Regular price</span></span>
    <span class="product-price__price compare_price"><span class="price">

    $130.95 AUD
    </span></span>


    </div>
  2. Stevie

    Stevie Active Member

    Messages:
    115
    Hi dudester,
    I think I have achieved what you are after by loading the source into a string and then finding the number by specifying the surrounding text.
    Obviously this is pretty bespoke to this problem and there may be a more dynamic solution, but I hope it helps.
    Please ensure the appropriate references are enabled, I believe the correct one for this is 'Microsoft Internet Controls' but I may be mistaken.

    Here is the working code:
    Code (vb):
    Sub Tester()
    Dim obj As Object
    Dim source As String
    Dim searchstring As String
    Dim searchstring2 As String
    Dim findFirst As Integer
    Dim findSecond As Integer
    Dim output As String
    Dim outputDbl As Double
    Dim url As String
    Set obj = CreateObject("MSXML2.XMLHTTP")
    url = "C:\Users\redacted\webpages\htmlexample.html" 'put url here
    obj.Open "GET", url, False
    obj.send
    source = obj.responseText ' store source code in string varible
    set obj = nothing
    searchstring = "<span class=""price"">$" ' specify unique string to left of number
    searchstring2 = " AUD</span>" & vbCrLf & "<span class=""product-price" ' specify unique string to right of number
    findFirst = InStr(1, source, searchstring, vbTextCompare)
    findSecond = InStr(1, source, searchstring2, vbTextCompare)
    output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
    outputDbl = CDbl(output) ' convert number to double
    Debug.Print outputDbl
    End Sub
    If this was helpful, please click 'Like' below!
  3. dudester

    dudester New Member

    Messages:
    9
    Hi Stevie,

    Thanks for your code, however when I run it I get an overflow error on
    findFirst = InStr(1, source, searchstring, vbTextCompare)
  4. Stevie

    Stevie Active Member

    Messages:
    115
    Hi dudester, I used the source snippet from your example, which fits within the string variable.
    Presumably, the actual webpage is much larger, and therefore fails when it tries to load the source into the string.

    I would suggest finding out how long the webpage is, or how long it could be, then declaring the string variable as such, e.g:
    Code (vb):
    Dim source As String * 1024
    which should allow for a string of length 1024.

    Let me know how you get on.

    If this was helpful, please click 'Like' below!
    dudester likes this.
  5. dudester

    dudester New Member

    Messages:
    9
    Hi,

    How can I determine the length ?

    Using Dim source As String * 1024, I get Invalid Procedure or Call on:
    output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
    Chirag R Raval likes this.
  6. Stevie

    Stevie Active Member

    Messages:
    115
    Hi dudester,
    please replace your code with this and try it:
    Code (vb):
    Sub Tester()
    Dim obj As Object
    Dim source As String * 65526
    Dim searchstring As String
    Dim searchstring2 As String
    Dim findFirst As Long
    Dim findSecond As Long
    Dim output As String
    Dim outputDbl As Double
    Dim url As String
    Set obj = CreateObject("MSXML2.ServerXMLHTTP")
    url = "" 'put url here
    obj.Open "GET", url, False
    obj.send
    source = obj.responseText ' store source code in string varible
    searchstring = "<span class=""price"">$" ' specify unique string to left of number
    searchstring2 = " AUD</span>" & vbCrLf & "<span class=""product-price" ' specify unique string to right of number
    findFirst = InStr(1, source, searchstring, vbTextCompare)
    findSecond = InStr(1, source, searchstring2, vbTextCompare)
    output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
    outputDbl = CDbl(output) ' convert number to double
    Debug.Print outputDbl
    End Sub
    Don't forget to put your url back in.

    I have changed the variables for find from integer to long and the source string to it's maximum length.
    I have also changed the object to remove some potential permissions issues.
    I had this working on a stackoverflow forum page just now (I would have used chandoo, but couldn't find the post contents in the source at a glance).

    To get the length so you can make your variable more reasonably sized, add:
    Code (vb):
    Debug.Print Len(Replace(source, " ", ""))
     
    in the code, after the line:
    Code (vb):
    source = obj.responseText ' store source code in string varible
    If this was helpful, please click 'Like' below!
  7. dudester

    dudester New Member

    Messages:
    9
  8. dudester

    dudester New Member

    Messages:
    9
    Length is 16337
    Last edited: Sep 17, 2018
  9. Stevie

    Stevie Active Member

    Messages:
    115
    Hi dudester,
    the code works fine for me. Please post the entire source for the webpage you are using and all the vba code exactly as you have it, so I can attempt to recreate your issue.
    You should not be receiving any overflow errors if you have followed my advice.
    Stevie
    dudester likes this.
  10. dudester

    dudester New Member

    Messages:
    9
    Hi,

    I am not sure whats going on now..?
    Now I'm getting a Connection Terminated Abnormally Error on any url i try?

    I would rather not post the address of the site i was originally wanting to scrape so am using a test site which has the same structure, if I can get this working I can adapt the search for anything.
    URL is: https://australia.triangl.com/products/mylos-orange-pop

    Code is below

    Thank you for your help


    Sub Tester()
    Dim obj As Object
    Dim source As String * 65526
    Dim searchstring As String
    Dim searchstring2 As String
    Dim findFirst As Long
    Dim findSecond As Long
    Dim output As String
    Dim outputDbl As Double
    Dim url As String
    Set obj = CreateObject("MSXML2.ServerXMLHTTP")
    url = "https://australia.triangl.com/products/mylos-orange-pop"
    obj.Open "GET", url, False
    obj.send
    source = obj.responseText ' store source code in string varible
    searchstring = "<span class=""price"">$" ' specify unique string to left of number
    searchstring2 = " AUD</span>" & vbCrLf & "<span class=""product-price" ' specify unique string to right of number
    findFirst = InStr(1, source, searchstring, vbTextCompare)
    findSecond = InStr(1, source, searchstring2, vbTextCompare)
    output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
    outputDbl = CDbl(output) ' convert number to double
    Debug.Print outputDbl
    End Sub
  11. Stevie

    Stevie Active Member

    Messages:
    115
    Hi dudester,
    it seems you did not read my code or comments very carefully.
    The source code of your test site is different to the source code you initially posted if it is the price you are wishing to extract.
    Due to this, you need to modify the searchstring and searchstring2 variables to be able to find the correct characters from the source.
    I have done this below:
    Code (vb):
    Sub Tester()
    Dim obj As Object
    Dim source As String * 65526
    Dim searchstring As String
    Dim searchstring2 As String
    Dim findFirst As Long
    Dim findSecond As Long
    Dim output As String
    Dim outputDbl As Double
    Dim url As String
    Set obj = CreateObject("MSXML2.ServerXMLHTTP")
    url = "https://australia.triangl.com/products/mylos-orange-pop" 'put url here
    obj.Open "GET", url, False
    obj.send
    source = obj.responseText ' store source code in string varible
    Debug.Print Len(Replace(source, " ", ""))
    searchstring = "<span itemprop=""price"" content=""" ' specify unique string to left of number
    searchstring2 = " AUD"" id=""price"">" ' specify unique string to right of number
    findFirst = InStr(1, source, searchstring, vbTextCompare)
    findSecond = InStr(1, source, searchstring2, vbTextCompare)
    output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
    outputDbl = CDbl(output) ' convert number to double
    Debug.Print outputDbl
    End Sub
    This returns 86 for me, which is the price of the object on the page.

    If this code is not running for you, then there must be some environmental issues on your machine which are non-standard.

    An alternative method would be to search to source for 'AUD' and return the characters prior to it, but this makes the assumption that there is only one instance of 'AUD' on the page.

    As I said previously, this code will only work for the page you linked, or potentially other pages with the same layout as it is specifically searching for text strings in the source code which surround the price you are after.

    If any of this has been helpful, please click 'Like' on the bottom right.

    Stevie
    dudester likes this.
  12. dudester

    dudester New Member

    Messages:
    9
    Hi,

    Yes i know the search string needed tweaking but my bigger issue now is the error i get, your provided code still gives me the same error, i dont understand why? I can connect to random other urls ok via vba
    What could effect this?


    Run-time error '-2147012866 (80072efe)':
    The connection with server was terminated abnormally
  13. dudester

    dudester New Member

    Messages:
    9
    Hi,

    I just tried your code on a windows 10 pc and it works, I have windows 7, any ideas how to fix this? I tried a few supposed patches from a google search but did not fix the issue.

    Thanks.
  14. Stevie

    Stevie Active Member

    Messages:
    115
    Hi dudester,
    as I have said, it is working for me, so I can't recreate your issue.
    I am running it on a windows 7 machine. Have you enabled the references I mentioned a few days ago? (tools -> references)
    try replacing MSXML2.ServerXMLHTTP with MSXML2.XMLHTTP aswell to see if that helps.
    dudester likes this.
  15. dudester

    dudester New Member

    Messages:
    9
    Yes! Changing the obj definition solves the issue, thank you.
    Thank you very much for your help with this.


    Regards,
    Phil.
    Stevie likes this.
  16. QHarr

    QHarr New Member

    Messages:
    2
    You haven't shown any code or provided an URL or even described the page. If the page is not slow loading (JS heavy), and your element is not within a parent frame/iframe then you can use a descendant combinator CSS selector to combine the 2 class attributes shown to select the element in question. This assumes you want the first match on the page for this selector i.e. that the price you mention is the first element retrieved by the selector. Not enough HTML shown to be sure.

    Code (vb):

    Option Explicit
    'VBE > Tools > References > Microsoft HTML Object Library
    Public Sub GetPrice()
        Dim sResponse As String, html As HTMLDocument
        Set html = New HTMLDocument
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", "yourURLgoesHere", False
            .setRequestHeadoer "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
            .send
            sResponse = StrConv(.responseBody, vbUnicode)
        End With
        With html
            .body.innerHTML = sResponse
            Debug.Print html.querySelector(".product-price__price.product-price__sale .price").innerText
        End With
    End Sub
     
    shahin likes this.
  17. QHarr

    QHarr New Member

    Messages:
    2
    No reference is required. You are using a late binding with XMLHTTP object.
  18. Chihiro

    Chihiro Excel Ninja

    Messages:
    5,053
    @QHarr

    Stevie used early binding for HTML object, reference should be added for Microsoft HTML Object Library.

    Though yes, xmlhttp is late bound and does not need reference.
    Stevie likes this.

Share This Page