• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract product-price from a URL in Excel VB

dudester

New Member
Hi,

I am new to web scraping with Excel, I want to extract the product price to a variable from a url, the snippet of relevant HTML is below, the line <span class="price">$120.95 AUD</span> is the value I want to extract but I cant get my coding right to pick it up, what is the best way to do this?



Thank you




<div class="product-price">

<span class="product-price__price product-price__sale">
<span class="price">$120.95 AUD</span>
<span class="product-price__sale-label"><span class="brackets skeleton-loading" data-trans-key="products.product.on_sale">Sale</span></span>
</span>

<span class="visually-hidden"><span class="brackets skeleton-loading" data-trans-key="products.product.regular_price">Regular price</span></span>
<span class="product-price__price compare_price"><span class="price">

$130.95 AUD
</span></span>


</div>
 

Stevie

Active Member
Hi dudester,
I think I have achieved what you are after by loading the source into a string and then finding the number by specifying the surrounding text.
Obviously this is pretty bespoke to this problem and there may be a more dynamic solution, but I hope it helps.
Please ensure the appropriate references are enabled, I believe the correct one for this is 'Microsoft Internet Controls' but I may be mistaken.

Here is the working code:
Code:
Sub Tester()
Dim obj As Object
Dim source As String
Dim searchstring As String
Dim searchstring2 As String
Dim findFirst As Integer
Dim findSecond As Integer
Dim output As String
Dim outputDbl As Double
Dim url As String
Set obj = CreateObject("MSXML2.XMLHTTP")
url = "C:\Users\redacted\webpages\htmlexample.html" 'put url here
obj.Open "GET", url, False
obj.send
source = obj.responseText ' store source code in string varible
set obj = nothing
searchstring = "<span class=""price"">$" ' specify unique string to left of number
searchstring2 = " AUD</span>" & vbCrLf & "<span class=""product-price" ' specify unique string to right of number
findFirst = InStr(1, source, searchstring, vbTextCompare)
findSecond = InStr(1, source, searchstring2, vbTextCompare)
output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
outputDbl = CDbl(output) ' convert number to double
Debug.Print outputDbl
End Sub
If this was helpful, please click 'Like' below!
 

dudester

New Member
Hi Stevie,

Thanks for your code, however when I run it I get an overflow error on
findFirst = InStr(1, source, searchstring, vbTextCompare)
 

Stevie

Active Member
Thanks for your code, however when I run it I get an overflow error on
findFirst = InStr(1, source, searchstring, vbTextCompare)
Hi dudester, I used the source snippet from your example, which fits within the string variable.
Presumably, the actual webpage is much larger, and therefore fails when it tries to load the source into the string.

I would suggest finding out how long the webpage is, or how long it could be, then declaring the string variable as such, e.g:
Code:
Dim source As String * 1024
which should allow for a string of length 1024.

Let me know how you get on.

If this was helpful, please click 'Like' below!
 

dudester

New Member
Hi,

How can I determine the length ?

Using Dim source As String * 1024, I get Invalid Procedure or Call on:
output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
 

Stevie

Active Member
Hi dudester,
please replace your code with this and try it:
Code:
Sub Tester()
Dim obj As Object
Dim source As String * 65526
Dim searchstring As String
Dim searchstring2 As String
Dim findFirst As Long
Dim findSecond As Long
Dim output As String
Dim outputDbl As Double
Dim url As String
Set obj = CreateObject("MSXML2.ServerXMLHTTP")
url = "" 'put url here
obj.Open "GET", url, False
obj.send
source = obj.responseText ' store source code in string varible
searchstring = "<span class=""price"">$" ' specify unique string to left of number
searchstring2 = " AUD</span>" & vbCrLf & "<span class=""product-price" ' specify unique string to right of number
findFirst = InStr(1, source, searchstring, vbTextCompare)
findSecond = InStr(1, source, searchstring2, vbTextCompare)
output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
outputDbl = CDbl(output) ' convert number to double
Debug.Print outputDbl
End Sub
Don't forget to put your url back in.

I have changed the variables for find from integer to long and the source string to it's maximum length.
I have also changed the object to remove some potential permissions issues.
I had this working on a stackoverflow forum page just now (I would have used chandoo, but couldn't find the post contents in the source at a glance).

To get the length so you can make your variable more reasonably sized, add:
Code:
Debug.Print Len(Replace(source, " ", ""))
in the code, after the line:
Code:
source = obj.responseText ' store source code in string varible
If this was helpful, please click 'Like' below!
 

Stevie

Active Member
I just tried testing on chandoo too and also same error so I dont think its related to page content.
url = "https://chandoo.org/forum/threads/extract-product-price-from-a-url-in-excel-vb.39758/#post-237970"
Hi dudester,
the code works fine for me. Please post the entire source for the webpage you are using and all the vba code exactly as you have it, so I can attempt to recreate your issue.
You should not be receiving any overflow errors if you have followed my advice.
Stevie
 

dudester

New Member
Hi,

I am not sure whats going on now..?
Now I'm getting a Connection Terminated Abnormally Error on any url i try?

I would rather not post the address of the site i was originally wanting to scrape so am using a test site which has the same structure, if I can get this working I can adapt the search for anything.
URL is: https://australia.triangl.com/products/mylos-orange-pop

Code is below

Thank you for your help


Sub Tester()
Dim obj As Object
Dim source As String * 65526
Dim searchstring As String
Dim searchstring2 As String
Dim findFirst As Long
Dim findSecond As Long
Dim output As String
Dim outputDbl As Double
Dim url As String
Set obj = CreateObject("MSXML2.ServerXMLHTTP")
url = "https://australia.triangl.com/products/mylos-orange-pop"
obj.Open "GET", url, False
obj.send
source = obj.responseText ' store source code in string varible
searchstring = "<span class=""price"">$" ' specify unique string to left of number
searchstring2 = " AUD</span>" & vbCrLf & "<span class=""product-price" ' specify unique string to right of number
findFirst = InStr(1, source, searchstring, vbTextCompare)
findSecond = InStr(1, source, searchstring2, vbTextCompare)
output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
outputDbl = CDbl(output) ' convert number to double
Debug.Print outputDbl
End Sub
 

Stevie

Active Member
Hi dudester,
it seems you did not read my code or comments very carefully.
The source code of your test site is different to the source code you initially posted if it is the price you are wishing to extract.
Due to this, you need to modify the searchstring and searchstring2 variables to be able to find the correct characters from the source.
I have done this below:
Code:
Sub Tester()
Dim obj As Object
Dim source As String * 65526
Dim searchstring As String
Dim searchstring2 As String
Dim findFirst As Long
Dim findSecond As Long
Dim output As String
Dim outputDbl As Double
Dim url As String
Set obj = CreateObject("MSXML2.ServerXMLHTTP")
url = "https://australia.triangl.com/products/mylos-orange-pop" 'put url here
obj.Open "GET", url, False
obj.send
source = obj.responseText ' store source code in string varible
Debug.Print Len(Replace(source, " ", ""))
searchstring = "<span itemprop=""price"" content=""" ' specify unique string to left of number
searchstring2 = " AUD"" id=""price"">" ' specify unique string to right of number
findFirst = InStr(1, source, searchstring, vbTextCompare)
findSecond = InStr(1, source, searchstring2, vbTextCompare)
output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
outputDbl = CDbl(output) ' convert number to double
Debug.Print outputDbl
End Sub
This returns 86 for me, which is the price of the object on the page.

If this code is not running for you, then there must be some environmental issues on your machine which are non-standard.

An alternative method would be to search to source for 'AUD' and return the characters prior to it, but this makes the assumption that there is only one instance of 'AUD' on the page.

As I said previously, this code will only work for the page you linked, or potentially other pages with the same layout as it is specifically searching for text strings in the source code which surround the price you are after.

If any of this has been helpful, please click 'Like' on the bottom right.

Stevie
 

dudester

New Member
Hi,

Yes i know the search string needed tweaking but my bigger issue now is the error i get, your provided code still gives me the same error, i dont understand why? I can connect to random other urls ok via vba
What could effect this?


Run-time error '-2147012866 (80072efe)':
The connection with server was terminated abnormally
 

dudester

New Member
Hi,

I just tried your code on a windows 10 pc and it works, I have windows 7, any ideas how to fix this? I tried a few supposed patches from a google search but did not fix the issue.

Thanks.
 

Stevie

Active Member
Hi dudester,
as I have said, it is working for me, so I can't recreate your issue.
I am running it on a windows 7 machine. Have you enabled the references I mentioned a few days ago? (tools -> references)
try replacing MSXML2.ServerXMLHTTP with MSXML2.XMLHTTP aswell to see if that helps.
 

QHarr

New Member
You haven't shown any code or provided an URL or even described the page. If the page is not slow loading (JS heavy), and your element is not within a parent frame/iframe then you can use a descendant combinator CSS selector to combine the 2 class attributes shown to select the element in question. This assumes you want the first match on the page for this selector i.e. that the price you mention is the first element retrieved by the selector. Not enough HTML shown to be sure.

Code:
Option Explicit
'VBE > Tools > References > Microsoft HTML Object Library
Public Sub GetPrice()
    Dim sResponse As String, html As HTMLDocument
    Set html = New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "yourURLgoesHere", False
        .setRequestHeadoer "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    With html
        .body.innerHTML = sResponse
        Debug.Print html.querySelector(".product-price__price.product-price__sale .price").innerText
    End With
End Sub
 

QHarr

New Member
Hi dudester,
I think I have achieved what you are after by loading the source into a string and then finding the number by specifying the surrounding text.
Obviously this is pretty bespoke to this problem and there may be a more dynamic solution, but I hope it helps.
Please ensure the appropriate references are enabled, I believe the correct one for this is 'Microsoft Internet Controls' but I may be mistaken.

Here is the working code:
Code:
Sub Tester()
Dim obj As Object
Dim source As String
Dim searchstring As String
Dim searchstring2 As String
Dim findFirst As Integer
Dim findSecond As Integer
Dim output As String
Dim outputDbl As Double
Dim url As String
Set obj = CreateObject("MSXML2.XMLHTTP")
url = "C:\Users\redacted\webpages\htmlexample.html" 'put url here
obj.Open "GET", url, False
obj.send
source = obj.responseText ' store source code in string varible
set obj = nothing
searchstring = "<span class=""price"">$" ' specify unique string to left of number
searchstring2 = " AUD</span>" & vbCrLf & "<span class=""product-price" ' specify unique string to right of number
findFirst = InStr(1, source, searchstring, vbTextCompare)
findSecond = InStr(1, source, searchstring2, vbTextCompare)
output = Mid(source, findFirst + Len(searchstring), findSecond - (findFirst + Len(searchstring))) ' extract number as string
outputDbl = CDbl(output) ' convert number to double
Debug.Print outputDbl
End Sub
If this was helpful, please click 'Like' below!
No reference is required. You are using a late binding with XMLHTTP object.
 

Chihiro

Excel Ninja
@QHarr

Stevie used early binding for HTML object, reference should be added for Microsoft HTML Object Library.

Though yes, xmlhttp is late bound and does not need reference.
 
Top