• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Get web page title using UDF

YasserKhalil

Well-Known Member
Hello everyone
I am trying to get the title of a web page and this is working for some links
Code:
Sub Test()
    Dim xmlPage As New MSXML2.XMLHTTP60

    xmlPage.Open "Get", "https://youtu.be/znVmbjFEXds", False
    xmlPage.send

    GetPageTitle xmlPage.responseText
End Sub

Function GetPageTitle(html As String)
    GetPageTitle = Mid(html, InStr(html, "<title>") + Len("<title>"), InStr(html, "</title>") - InStr(html, "<title>") - Len("</title>") + 1)
    Debug.Print GetPageTitle
End Function

But as for youtube I think it require a login details .. but I don't know how to do that
so I got access denied error

Thanks advanced for help
 
Hi !

Check the webpage base html code : title is just YouTube

Use the correct youtube.com URL instead of youtu.be,
just see the redirected address when surfing manually …
 
Thanks for clarification
In fact I am searching for a way to get the title of the video itself. Is it possible?
 


Check first via piloting Internet Explorer as your code request seems
to be only for the base code and not for dynamic data …​
 
Thanks a lot
For navigation using IE I used these lines
Code:
Sub Test()
    Dim ie As Object

    Set ie = CreateObject("InternetExplorer.Application")

    ie.Visible = True
    ie.Navigate "https://youtu.be/znVmbjFEXds"
   
   
End Sub

And I have checked Inspect Element for the title of the video and here's a snapshot
Untitled.png

Can you help me in that part please to extract the title of the video?
 
DOM & HTML basics …​
Code:
Sub DemoIE()
    With CreateObject("InternetExplorer.Application")
             On Error GoTo Fin
            .Navigate "https://youtu.be/znVmbjFEXds"
      While .ReadyState < 4:  DoEvents:  Wend
        [A1].Value = .Document.Title
Fin:
            .Quit
    End With
End Sub
Do you like it ? So thanks to click on bottom right Like !
 
Code:
Sub DemoReq()
    With CreateObject("WinHttp.WinHttpRequest.5.1")
        .Open "GET", "https://youtu.be/znVmbjFEXds", False
        .setRequestHeader "DNT", "1"
         On Error Resume Next
        .send
     If .Status = 200 Then T$ = .responseText
         On Error GoTo 0
    End With
    If T > "" Then
        With CreateObject("htmlfile")
                         .write T
            [A1].Value = .Title
        End With
    End If
End Sub
You may Like it !
 
That's really amazing and wonderful

In this line
Code:
.setRequestHeader "DNT", "1"
What is "DNT" refers to?
 
Hi Marc L, the way you have shown your second demo in which "CreateObject("WinHttp.WinHttpRequest.5.1")" is applied is just awesome. I wanted to get the "view count" by using ClassName like "[A1].Value = .getElementsByClassName("view-count style-scope yt-view-count-renderer")(0).innerText" but it doesn't seem to work. As this method can handle javascript encrypted items so I tried that way. Could you point me where I'm going wrong? Thanks in advance.
 
What is "DNT" refers to?
Do Not Track … (optional)

I wanted to get the "view count" by using ClassName like "[A1].Value = .getElementsByClassName …
As yet & often written on this forum, even under IE
getElementsByClassName if often a mess ! It may work on a computer
but same code on another computer can fail …
Maybe here under htmlfile object it is not implemented or
it just fails as a common not a surprise !
As always the easy way is scrapping from an element ID … (itself or parent)

Edit : here it seems to be possible only under IE
 
@Marc L, I tried with ID but that also failed. As i'm not familiar with this ".write" approach so i thought if i could get any insight on how i can get any info connected to ID, TAG, CLASS etc from that page. My erroneous approach was "[A1].Value = .getElementById("owner-name").getElementsByTagName("a")(0).innerText" but it threw an error "object variable----with block--".
 
Following my Edit from my previous post for the views count :

• it works under IE and miracle this time using getElementsByClassName
but not with the class name found under an up-to-date Firefox version
but with the class name under an older IE which is very not the same name !
So my code works on my tests computer but should fail on your differents
computers configurations (IE & Windows). [¤]

• It also works after a request using htmlfile object
with the element ID found under IE inner inspector tool
as again the ID is not the same under Firefox !

So if your usual browser is not Internet Explorer,
you must check elements under Internet Explorer !

Again it is just about reading, but well …


[¤] : for example I've made a code working under IE9 and Seven.
Same code under Seven and IE10 fails !
But same code under IE10 and Windows 8 works ‼

Another example : under Windows 8 / 8.1 and IE11 an easy code fails
on some computers but works on others even with same configuration.
So I create a work around that some pros called it as a dirty code !
A day I asked one of these pros to code the easy process
with its clean way. After hours he gave up :
« I don't understand why it can't work ! » (Same way as my original try.)
My answer was : « Remember my "dirty" code and now try it ! »
Of course it worked but we don't know why original clean code fails
with some computers but works under same setup on others …
 
@ shahin
Refer to below post made by me on .write in one of your thread.
It's pretty much required to fill <Head> section of html from responseText. Without using IE automation. But as noted, it should not have reference to HTML object library.

https://chandoo.org/forum/threads/h...ommand-in-xmlhttp-requests.36449/#post-218694

And when late bound, or reference not added to HTML object library, many of "getElementsByxxxx" functions fail or behave unexpectedly (only reliable one is ByTagName/ID). As Marc have written here and I wrote elsewhere.
 
@sir chihiro, that is what i tried to do here. I used ID and TAG name to get the view count. But it doesn't work.
Code:
Sub DemoReq()
    Dim http As New MSXML2.ServerXMLHTTP60
    With http
        .Open "GET", "https://youtu.be/znVmbjFEXds", False
        .send
        T$ = .responseText
    End With
    If T > "" Then
        With CreateObject("htmlfile")
                         .write T
            [A1].Value = .getElementById("count").getElementsByTagName("span")(0).innerText
        End With
    End If
End Sub
 

Chihiro, again you well pointed out the issue using a class name !

In late binding, getElementsByClassName fails under htmlfile object.

I just test in early binding just adding the reference and
this time - not every - getElementsByClassName well works !

As on my side there is an ID, I rather prefer late binding using this ID …

shahin : all is in post #13, read it again … And again ! …
 
@shahin
Like I've written many times. You should first study html source code before you proceed to coding...

The information you are after isn't in...
Code:
.getElementById("count").getElementsByTagName("span")(0).innerText

But is in...
Code:
.getElementbyID("watch7-views-info").getElementsByTagName("div")(0).innerText
 
@sir chihiro, you are absolutely right . I'm still unable to dig out what you have already found out. This time I'm seriously confused sir. I searched through source code and html elements but unable to locate it. Could you give me a hint where should i make my search exactly. However, i got misled to come across the floating elements based on which i have created my expression earlier. Thanks a lot once again sir.
 
@ sir chihiro, I really can't believe that this ".write" method can handle javascript encrypted items very smoothly and help parse it. I tried with few other sites and found it working like charm. The only problem with this is it can't handle class names.
 
The only problem with this is it can't handle class names.
No ‼ :eek: As yet written in post #13 & #16, :rolleyes:
class name works this time whatever under IE even in late binding or
by a request & a HTMLDocument (in early binding of htmlfile object) …

And a fourth way, as yet stated in Chihiro post #19,
as the views number is harcoded in the webpage html base code,
- so here nothing to do with javascript ! :confused: -​
again it's just about reading, ;) you can extract it just with VBA basics text
functions like Split as several times demonstrated in your own threads ! …

Class name is used in 3 out of 4 of my codes scrapping the views count !
This webpage is not difficult to handle, at beginner level …
 
Back
Top