• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Unable to parse a table

shahin

Active Member
Hi there! I tried to parse a table but unfortunately couldn't. So far when i scraped any table i used three tag names usually: "table", "tr" and "td". But in this case i found another tag name "th" which is making me confused. I'm pasting here the code i have written. Any help would be greatly appreciated.

Code:
Sub TableData()

Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, topic As Object, posts As Object, post As Object
Dim x As Long, y As Long

x = 2
y = 1

With http
    .Open "GET", "http://www.espncricinfo.com/rankings/content/page/211270.html", False
    .send
    html.body.innerHTML = .responseText
End With

Set topics = html.getElementsByTagName("table")


 For Each topic In topics
    For Each posts In topic.getElementsByTagName("tr")
        For Each post In posts.getElementsByTagName("th")
        Cells(x, y) = post.innerText
        y = y + 1
    Next post
    y = 1
    x = x + 1
    Next posts
 Next topic
End Sub
 
I tried with another method but no improvement. Here is the code:

Code:
Sub TableData()

Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, topic As Object, posts As Object, post As Object
Dim x As Long, y As Long

x = 2
y = 1

With http
    .Open "GET", "http://www.espncricinfo.com/rankings/content/page/211270.html", False
    .send
    html.body.innerHTML = .responseText
End With

Set topics = html.getElementsByTagName("table")(0)

For Each posts In topics.Rows
    For Each post In posts.Cells
    Cells(x, y) = post.innerText
    y = y + 1
    Next post
    y = 1
    x = x + 1
Next posts
End Sub
 
Thanks sir , Chihiro. I tested my code with the link you have provided and noticed that it is working perfectly but what i actually wanna know is I have gone through different table elements in different sites where more or less this "th" tag element comes up. What should i do to parse the table in which "td" and "th" element both are present or only "th" element is present. You taught me how to parse a table with "td" tag but this "th" tag is beyond my knowledge. Perhaps the link i worked with is not the ideal one to be parsed the way i think but there are several others with the elements i tried to describe. So if you already gave anyone the solution concerning my query in this thread, please do redirect me to that page. Thanks once again and i hope i could make myself clear with what i am pursuing.
 
Thanks to have you again sir. Will be back shortly with a site holding tables with those features. Combing through different sites now.
 
Actually i tried to parse the table with "th" tag in a pythonic way. The code for that is:
Code:
Sub TableData()

Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, topic As Object, posts As Object, post As Object, hd As Object
Dim x As Long, y As Long, z As Long

x = 2
y = 2
z = 2
With http
    .Open "GET", "http://www.basketball-reference.com/players/a/", False
    .send
    html.body.innerHTML = .responseText
End With

Set topics = html.getElementsByTagName("table")

 For Each topic In topics
    For Each posts In topic.getElementsByTagName("tr")
        For Each post In posts.getElementsByTagName("td")
            Set hd = posts.getElementsByTagName("th")(0)
            Cells(z, 1) = hd.innerText
            Cells(x, y) = post.innerText
            y = y + 1
        Next post
        y = 2
        x = x + 1
        z = z + 1
    Next posts
 Next topic
End Sub
 
@Marc L

Header tags (H3) yes . Or iframe.src for each table.

But I don't think you can scrape contents of "iframe" without navigating to source? All tables in the site uses "iframe" to load table, and responseTxt lacks the table data itself.
 
when i scraped any table i used three tag names usually:
"table", "tr" and "td".
If you don't want easy clipboardData method,
you just need to find the Table object and within this object,
as you can see its structure in Locals window,
you just need to loop its rows collection
so for each row you can loop each cell of its cells collection …
 
Yes, my previous post is exactly the same as post #2 …

clipboardData works also piloting IE, see samples within this forum.

So whatever by IE or request, OP has solutions …
 
Paste this demonstration to worksheet module :​
Code:
Sub DemoReq()
     Dim iDoc As Object, oDoc As Object, oElt As Object
With CreateObject("Msxml2.XMLHTTP")
    .Open "GET", "http://www.espncricinfo.com/rankings/content/page/211270.html", False
    .setRequestHeader "DNT", "1"
     On Error Resume Next
    .send
     On Error GoTo 0
     If .Status <> 200 Then Beep: Debug.Print .Status; " " & .StatusText: Exit Sub
     Me.UsedRange.Clear
     Application.ScreenUpdating = False
     Set iDoc = CreateObject("htmlfile")
     Set oDoc = CreateObject("htmlfile")
         oDoc.write .responseText
    For Each oElt In oDoc.all.ciHomeContentlhs.getElementsByTagName("IFRAME")
           .Open "GET", oElt.src, False
           .setRequestHeader "DNT", "1"
           .send
        If .Status = 200 Then
               iDoc.body.innerHTML = .responseText
            If iDoc.frames.clipboardData.setData("Text", "<strong>" & oElt.previousSibling.innerText & "</strong>" _
                                                       & iDoc.getElementsByTagName("TABLE")(0).outerHTML) Then
                With Cells(Rows.Count, 3).End(xlUp)(4, 0)
                        Me.Paste .Cells(1 + (.Row = 4))
'                    With .CurrentRegion.Rows
'                        With .Item("3:" & .Count).Columns
'                            With Union(.Item(1), .Item(4))
'                                .HorizontalAlignment = xlRight
'                                .IndentLevel = 2
'                            End With
'                                .Item(2).WrapText = False
'                                .Item(2).IndentLevel = 1
'                                .Item(3).HorizontalAlignment = xlCenter
'                        End With
'                            If .Cells(.Count, 1).Value = "" Then .Item(.Count).Clear
'                    End With
                 End With
            End If
        End If
    Next
End With
         iDoc.frames.clipboardData.clearData "Text"
     Set iDoc = Nothing:  Set oDoc = Nothing
'With Me.UsedRange.Columns(2)
'    .AutoFit
'    .ColumnWidth = .ColumnWidth + 1
'End With
      Application.ScreenUpdating = True
End Sub
Do you like it ? So thanks to click on bottom right Like !
 
@Marc L
OMG!!!!! You are just impossible!! It is working perfectly irrespective of redirection. Thanks a lot. Btw, I am unable to pursue the way you write code because you prefer to use vbscript (i suppose) and it is greek to me. I'm not that advanced in it. Anyways, could you please give me a hint or a link following which I can at least learn the usage of [#, @@,freefile(),$ and so on]. Thanks again.
 
VBA demonstration only, no VBScript !
Same way as many threads from this forum …

For FreeFile as well characters to declare variables,
all is in VBA inner help ! Read help for Double, String, …

You can check also during execution variables type
within Locals window !

A must see : 10.21-10=0.1999999999999999999
 
Back
Top