• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to extract First 5 URL to a spreadsheet from a Google search result?

Marc L

Excel Ninja

It was the fisrt time I did not "see" the link in first child of an element …

I wanna know your IE version and some properties of Item 1
like its innerText, nodeName / tagName … Thanks
 

Nebu

Excel Ninja
Hi Marc:

My Internet explorer is 11
Verison:11.0.9600.18499

Item1:
Tagname: Value:"SPAN":Type:String
InnerText:"[PDF]":Type:String


Note: this is what the values were when it threw the error.

Thanks
 
Last edited:

Marc L

Excel Ninja

I have not the same under IE9, I'll check later on another computer
with IE11 if I get an item before the one with link …

Delete oRef unused variable in Dim codeline … (post #11 updated)
 

Marc L

Excel Ninja
Under IE11 I have same span as on your side …
Some issue may come using different IE or Windows version.

I had a working code under Seven & IE9 which was failing
under Seven and IE10 but with no issue under Windows 8 and IE10 !

Without using IE, try this :​
Code:
Sub DemoRequest1()
        Static B As Boolean
            If B Then Exit Sub
      Const URL = "https://www.google.co.in/search?q="
        Dim oJS As Object, oReq As Object, C%, F&, L&, R&, SL$(), SP$(), T$, V
        Set oJS = CreateObject("MSScriptControl.ScriptControl")
            oJS.Language = "JScript"
            oJS.AddCode "function decode(txt) {return decodeURIComponent(txt);}"
        Set oReq = CreateObject("Msxml2.XMLHTTP")
       V = Application.Caller
       B = Not IsError(V)
    If B Then
       F = Sheet1.Shapes(V).Fill.ForeColor.RGB
      With Sheet1.Shapes(V).TextFrame.Characters:  T = .Text:  .Text = "Downloading  links …":  End With
    End If
    With Sheet1.[A4].CurrentRegion
        .Columns("C:G").Offset(1).ClearContents
        R = .Rows(.Rows.Count).Row
    End With
    For R = 5 To R
             If B Then Sheet1.Shapes(V).Fill.ForeColor.SchemeColor = 24 + (R And 1): DoEvents
        With oReq
            .Open "GET", URL & Sheet1.Cells(R, 2).Text, False
            .setRequestHeader "DNT", "1"
             On Error Resume Next
            .send
             On Error GoTo 0
            If .Status <> 200 Then Beep: Debug.Print "Row #" & R; " :"; .Status; " " & .StatusText: GoTo Fin
            SP = Split(.responseText, "<h3 class=""r"">")
        End With
        If UBound(SP) > 0 Then
                C = 2
            For L = 1 To UBound(SP)
               SL = Split(SP(L), "<a href=""")
                If UBound(SL) > 0 Then
                    SL = Split(Split(SL(1), """>")(0), "/url?url=")
                    If UBound(SL) > 0 Then SL(0) = Split(SL(1), "&amp;rct=")(0)
                    C = C + 1
                    Sheet1.Hyperlinks.Add Sheet1.Cells(R, C), oJS.Run("decode", SL(0)), , , Split(SL(0), "/")(2)
                    If C = 7 Then Exit For
                End If
            Next
        End If
    Next
Fin:
        Set oJS = Nothing:  Set oReq = Nothing
    If B Then
        With Sheet1.Shapes(V):  .TextFrame.Characters.Text = T:  .Fill.ForeColor.RGB = F:  End With
        B = False
    End If
End Sub
 

Marc L

Excel Ninja
Or this one using same DOM structure like piloting IE
(codelines extracting links use the same way as my original Demo !) :​
Code:
Sub DemoRequest2()
        Static B As Boolean
            If B Then Exit Sub
      Const URL = "https://www.google.co.in/search?q="
        Dim oJS As Object, oDoc As Object, oReq As Object, oElt As Object, C%, F&, R&, S$(), T$, V
        Set oJS = CreateObject("MSScriptControl.ScriptControl")
            oJS.Language = "JScript"
            oJS.AddCode "function decode(txt) {return decodeURIComponent(txt);}"
        Set oDoc = CreateObject("htmlfile")
        Set oReq = CreateObject("Msxml2.XMLHTTP")
       V = Application.Caller
       B = Not IsError(V)
    If B Then
       F = Sheet1.Shapes(V).Fill.ForeColor.RGB
      With Sheet1.Shapes(V).TextFrame.Characters:  T = .Text:  .Text = "Downloading  links …":  End With
    End If
    With Sheet1.[A4].CurrentRegion
        .Columns("C:G").Offset(1).ClearContents
        R = .Rows(.Rows.Count).Row
    End With
    For R = 5 To R
             If B Then Sheet1.Shapes(V).Fill.ForeColor.SchemeColor = 24 + (R And 1): DoEvents
        With oReq
            .Open "GET", URL & Sheet1.Cells(R, 2).Text, False
            .setRequestHeader "DNT", "1"
             On Error Resume Next
            .send
             On Error GoTo 0
            If .Status <> 200 Then Beep: Debug.Print "Row #" & R; " :"; .Status; " " & .StatusText: GoTo Fin
            oDoc.body.innerHTML = .responseText
        End With
                C = 2
        For Each oElt In oDoc.all.ires.getElementsByTagName("H3")
            With oElt.getElementsByTagName("A")
              If .Length Then
                C = C + 1
                S = Split(.Item(0).href, "/url?url=")
                If UBound(S) > 0 Then S(0) = Split(S(1), "&rct=")(0)
                Sheet1.Hyperlinks.Add Sheet1.Cells(R, C), oJS.Run("decode", S(0)), , , Split(S(0), "/")(2)
                If C = 7 Then Exit For
              End If
            End With
        Next
    Next
Fin:
        Set oJS = Nothing:  Set oDoc = Nothing:  Set oReq = Nothing:  Set oElt = Nothing
    If B Then
        With Sheet1.Shapes(V):  .TextFrame.Characters.Text = T:  .Fill.ForeColor.RGB = F:  End With
        B = False
    End If
End Sub
 

Nebu

Excel Ninja
Hi Marc:

Both your codes are working fine, but I wanted a small change to the code instead of displaying actual URL in the worksheets.

For eg:

When we are searching for “ABC Industries” AND protest, we grab the link (www.revenue.state.il.us) on to the spreadsheet, but instead of this can we grab the heading displayed on the search results in this example it will be
[PDF]IT 02-17 - Illinois Department of Revenue. In short, I want the header to be displayed and hyper linked instead of the URL.

This is what I understood the change should be:

header=Replace(link.innerHTML,"<EM>","")
header=Replace(header,"</EM>","")

where link is oElt.getElementsByTagName("A")


Thanks
 

Marc L

Excel Ninja
Hi !

Or via innerText property at parent level (h3) or link level (a),
in red mod for Demo and DemoRequest2 procedures :

Sheet1.Hyperlinks.Add Sheet1.Cells(R, C), oJS.Run("decode", S(0)), , , oElt.innerText
 

shahin

Active Member
How about this one. It will fetch 5 results from google including links to the results whatever your input may be.
Code:
Sub GoogleSearch()
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim posts As Object, Myval As Variant
Dim url As String

Myval = InputBox("Give me any input")
url = "https://www.google.co.in/search?q=" & Myval
With http
    .Open "GET", url, False
    .setRequestHeader "Content-Type", "text/xml"
    .send
    html.body.innerHTML = .responseText
End With
    Worksheets.Add
    ActiveSheet.Name = Myval
    Range("A1").Select
    Range("A1") = Myval
Set posts = html.getElementById("rso").getElementsByTagName("H3")
    For i = 0 To 5
        ActiveCell.Offset(0, 1) = posts(i).getElementsByTagName("a")(0).innerText
        ActiveCell.Offset(0, 2) = posts(i).getElementsByTagName("a")(0).href
        ActiveCell.Offset(1, 0).Select
    Next i
End Sub
 
Last edited:

shahin

Active Member
If the code breaks for some reason, make sure to put "On error resume next" before starting the for loop. That's it. Now it will definitely rock.
 
Hello.
I tried the macro but it did not go. Failure to fill out.
Since I'm not expert you might attach an excel file with your prepared macro?
Thank you
See you soon
 
Hi, I tried the macro. Only extract the first page results.
Is it also possible to extract the results of all other pages?

Then can not the mask appear, pointing to all the search keys in the excel sheet column (column a)?
Thank you
 

shahin

Active Member
It parses the first ten results from google search. See the title of this thread. It was meant to parse 5 results when I have created it. Thanks.
 

shahin

Active Member
@stefanoste78,
It is very close to what you are expecting. It gets the input and parses the results displayed across pages. A file with this macro has been attached.

Code:
Sub GoogleSearch()
Const link = "https://www.google.co.in"
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim htm As New HTMLDocument
Dim topics As Object, topic As Object, item As Object
Dim posts As Object, pagination As Object, Myval As Variant
Dim url As String

Myval = InputBox("Give me any input")
url = "https://www.google.co.in/search?q=" & Myval
    With http
        .Open "GET", url, False
        .setRequestHeader "Content-Type", "text/xml"
        .send
        html.body.innerHTML = .responseText
    End With
    ActiveSheet.Name = Myval
    Range("A1").Select
    On Error Resume Next
    Set topics = html.getElementById("rso").getElementsByTagName("H3")
    For Each topic In topics
        ActiveCell = topic.getElementsByTagName("a")(0).innerText
        ActiveCell.Offset(0, 1) = topic.getElementsByTagName("a")(0).href
        ActiveCell.Offset(1, 0).Select
    Next topic

    Set pagination = html.getElementsByClassName("fl")
    For v = 0 To pagination.Length - IIf(pagination.Length > 0, 1, 0)
    If InStr(pagination(v).href, "about:") > 0 Then
    zz = link & Split(pagination(v).href, "about:")(1)
   
    End If
    With http
        .Open "GET", zz, False
        .setRequestHeader "Content-Type", "text/xml"
        .send
        htm.body.innerHTML = .responseText
    End With
    Set posts = htm.getElementById("rso").getElementsByTagName("H3")
    For Each post In posts
        ActiveCell = post.getElementsByTagName("a")(0).innerText
        ActiveCell.Offset(0, 1) = post.getElementsByTagName("a")(0).href
        ActiveCell.Offset(1, 0).Select
    Next post
Next v
End Sub
Don't forget to give it a thumbs up, if it satiates your need. Thanks.
 

Attachments

Hello.
How would the macro become if I want to search more and more key search (Obtained from the macro in column c) and enter the results in the results sheet. When the column "a" of sheet results1 is filled, the macro must continue to paste the data on the result sheet 2 and so on until the job is finished.
I attach the file
 

Attachments

Diego Pumarino

New Member
Hi there! I just saw this code and it's awesome, works great!
I was wondering if there's a way I can get number of results in google.com, and the number of result from a determined location. Is it possible?
Thank you!
 

keebellah

New Member
Hi guys, sorry ti but in but I'm trying to help someone with the sample file Nebu Google 5 first .xlsb
When I run the search I get www.Google.co.in instead of the results that were already present.
Could somebody please explain what this means and why this could happen?
Thanks in advance
 

Marc L

Excel Ninja
As you can see this code dates from two years ago and since, Google has change the coding of its webpage,​
this kind of code lasts until the webpage changes, the reason why people needing such process must learn​
how works a particular webpage - as from a webpage to another one it's very not the same - and​
foresee / search within its html code which elements a procedure may grab.​
As always Google is not friendly to use …​
 

Marc L

Excel Ninja
A point to remember : an element existing in the webpage code (directly from a webbrowser) may not exist​
in the responseText of a request via MSXML2.XMLHTTP, WinHttp, ServerXMLHTTP60 (last both better), whatever …​
So once an element is located in the source webpage code, it's smart to check if the same element exists​
in the request response via Instr VBA function for example before to code forward avoiding to waste time.​
When it's not possible under request, it's time to pilot IE …​
 

keebellah

New Member
Thank you Marc,
I'll see, I'm not that much into web scraping but hope the friend can solve it.
I would really have to dig into it , but I'll head your tip and see if I can do something with the existing 'old' code :)
 
Top