• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Can't pull data from a stubborn webpage using vba

shahin

Active Member
Hi there all! Hope you are doing well. The site i tried to scrape category-names from is very simple to look at if you notice it's inspected element but when i create a parser i can't pull the data. I wanted to scrape only the 7 category names from that page. I tried with all possible angles but failed. If anybody helps me point out what I'm doing wrong, I would be very grateful to him. Thanks in advance. FYC, I'm pasting here the code I tried with.

Code:
Sub ItemName()

Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, topic As Object, posts As Object, post As Object, ele As Object
Dim x As Long

x = 2

http.Open "GET", "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193", False
http.send
html.body.innerHTML = http.responseText

Set topics = html.getElementsByClassName("categories")

For Each topic In topics
    For Each posts In topic.getElementsByTagName("li")
        For Each post In posts.getElementsByTagName("a")
            Set ele = post.getElementsByTagName("h4")(0)
            Cells(x, 1) = ele.innerText
            x = x + 1
        Next post
    Next posts
Next topic

End Sub
 
Last edited:
So what exactly are you trying to pull?

If it's category listing like "TV & Electronics" etc. It actually belongs to another ClassName.

To narrow it down to 7 categories only (that's displayed). You will need to first find <ul class="brick"> then find <strong></strong> within it (those are categories used as display header/buttons.

Note that there are multiple <ul class="brick"> and you will need to loop through them.
 
Thanks sir for your prompt reply. This site is making me crazy. If i inspect element pointing my cursor on any category names then it directly indicates the classname "categories" and reaching there if i hover my cursor around pointing that "categories" classname then it surrounds the 7 categories names area with shade i want to scrape. This is why its getting even more confusing to me. The other day when i dig deep i found the class "brick" you mentioned but today i can't even find it. Finally to answer your question: Yes, i was trying to scrape the 7 categories names only.
 
Found the source code. Now trying to work with it as you advised. Never worked with this source code thing, though.
 
Every time you respond, my code starts working instantly. It is now working. Thanks sir. Btw, could you please tell me why did't i get the "brick" category name in inspected element section and should i frequently look for element names in that source code section instead of inspected element while scraping different sites.
 
Inspection is useful tool for looking at "Containers". But often, web sites use Javascript or other means to fill Containers from XML response etc. Source Code will give you overall picture of what's actually read into generate site (along with using debug tools).

Inspection in general, is good tool to check <div> elements. But HTML tags within those <div> containers may not appear in inspect elements (or may appear else where).
 
FYI - Web Page scraping is more dependent on your skill to read and decipher HTML and web source code. Not so much on your VBA skill once you know the basics.

I'd strongly recommend studying W3School documentations.
http://www.w3schools.com/
 
Back
Top