• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Can't scrape title from webpages

Is it right what i did here? I meant, will that be something like this:
Code:
Sub Demo1()
    With New XMLHTTP60
            On Error Resume Next
        For r& = 1 To Cells(Rows.Count, 1).End(xlUp).Row
            .Open "GET", Cells(r, 1).Value, False
            .send
            If InStr(.responseText, "<title>") > 0 Then
                Cells(r, 2).Value = Split(Split(.responseText, "<title>")(1), "</")(0)
            ElseIf InStr(.responseText, "<TITLE>") > 0 Then
                Cells(r, 2).Value = Split(Split(.responseText, "<TITLE>")(1), "</")(0)
            Else: Cells(r, 2).Value = "Not Exists"
            End If
        Next
    End With
End Sub
 
Hmm? Why are you using two SPLIT function? Like I said, using vbTextCompare it will ignore case.

Using Marc's code as example... Note ", , 1" added in inner SPLIT.

Code:
Sub Demo1()
    With New XMLHTTP60
            On Error Resume Next
        For R& = 1 To Cells(Rows.Count, 1).End(xlUp).Row
            .Open "GET", Cells(R, 1).Value, False
            .send
            If .Status = 200 Then Cells(R, 2).Value = Split(Split(.responseText, "<title>", , 1)(1), "</")(0)
        Next
    End With
End Sub
 
Dear sir CHIHIRO, when I write a very tiny html element to check how it works when I use <TITLE> for a certain line, I could find out that it automatically turns in lowercase when inspected. Any reason behind this to take a note? I tried with:
Code:
<!DOCTYPE html>
<html>
<head>
  <TITLE>Page Title</TITLE>
</head>
<body>

<p>The content of the body element is displayed in the browser window.</p>
<p>The content of the title element is displayed in the browser tab, in favorites and in search engine results.</p>

</body>
</html>
 
Likely because whatever is inside <> (i.e. Tags) in html is case insensitive. Meaning HTML will interpret both <TITLE> and <title> as same thing.
 
@stefanoste78

Like I wrote, it's just a demo of using different variable. His code uses row index as variable, mine uses cell (range) object as variable.

@shahin

You can add argument to split function, "vbTextCompare" or 1. By default it's using binary compare method.
See link for detail on split function.
https://msdn.microsoft.com/en-us/library/6x627e5f(v=vs.90).aspx

As for "On Error Resume Next". It's used to skip over blank cells or when site isn't there. Other than that, there really isn't much that will cause error and isn't hard to debug / fix if needed.

In more complex code, you will want to pinpoint the cause of error, and thus trap it using methods other than "Resume Next".


ok
 
Thanks sir. I have taken a decision to learn javascript. In recent times, using javascript in combination with html elements is mushrooming. So, it is hard to play around with different sites without a basic knowledge on jscript. A bit late, though!
 
Back
Top