• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to extract First 5 URL to a spreadsheet from a Google search result?

Status
Not open for further replies.

It was the fisrt time I did not "see" the link in first child of an element …

I wanna know your IE version and some properties of Item 1
like its innerText, nodeName / tagName … Thanks
 
Hi Marc:

My Internet explorer is 11
Verison:11.0.9600.18499

Item1:
Tagname: Value:"SPAN":Type:String
InnerText:"[PDF]":Type:String


Note: this is what the values were when it threw the error.

Thanks
 
Last edited:

I have not the same under IE9, I'll check later on another computer
with IE11 if I get an item before the one with link …

Delete oRef unused variable in Dim codeline … (post #11 updated)
 
Under IE11 I have same span as on your side …
Some issue may come using different IE or Windows version.

I had a working code under Seven & IE9 which was failing
under Seven and IE10 but with no issue under Windows 8 and IE10 !

Without using IE, try this :​
Code:
Sub DemoRequest1()
        Static B As Boolean
            If B Then Exit Sub
      Const URL = "https://www.google.co.in/search?q="
        Dim oJS As Object, oReq As Object, C%, F&, L&, R&, SL$(), SP$(), T$, V
        Set oJS = CreateObject("MSScriptControl.ScriptControl")
            oJS.Language = "JScript"
            oJS.AddCode "function decode(txt) {return decodeURIComponent(txt);}"
        Set oReq = CreateObject("Msxml2.XMLHTTP")
       V = Application.Caller
       B = Not IsError(V)
    If B Then
       F = Sheet1.Shapes(V).Fill.ForeColor.RGB
      With Sheet1.Shapes(V).TextFrame.Characters:  T = .Text:  .Text = "Downloading  links …":  End With
    End If
    With Sheet1.[A4].CurrentRegion
        .Columns("C:G").Offset(1).ClearContents
        R = .Rows(.Rows.Count).Row
    End With
    For R = 5 To R
             If B Then Sheet1.Shapes(V).Fill.ForeColor.SchemeColor = 24 + (R And 1): DoEvents
        With oReq
            .Open "GET", URL & Sheet1.Cells(R, 2).Text, False
            .setRequestHeader "DNT", "1"
             On Error Resume Next
            .send
             On Error GoTo 0
            If .Status <> 200 Then Beep: Debug.Print "Row #" & R; " :"; .Status; " " & .StatusText: GoTo Fin
            SP = Split(.responseText, "<h3 class=""r"">")
        End With
        If UBound(SP) > 0 Then
                C = 2
            For L = 1 To UBound(SP)
               SL = Split(SP(L), "<a href=""")
                If UBound(SL) > 0 Then
                    SL = Split(Split(SL(1), """>")(0), "/url?url=")
                    If UBound(SL) > 0 Then SL(0) = Split(SL(1), "&amp;rct=")(0)
                    C = C + 1
                    Sheet1.Hyperlinks.Add Sheet1.Cells(R, C), oJS.Run("decode", SL(0)), , , Split(SL(0), "/")(2)
                    If C = 7 Then Exit For
                End If
            Next
        End If
    Next
Fin:
        Set oJS = Nothing:  Set oReq = Nothing
    If B Then
        With Sheet1.Shapes(V):  .TextFrame.Characters.Text = T:  .Fill.ForeColor.RGB = F:  End With
        B = False
    End If
End Sub
 
Or this one using same DOM structure like piloting IE
(codelines extracting links use the same way as my original Demo !) :​
Code:
Sub DemoRequest2()
        Static B As Boolean
            If B Then Exit Sub
      Const URL = "https://www.google.co.in/search?q="
        Dim oJS As Object, oDoc As Object, oReq As Object, oElt As Object, C%, F&, R&, S$(), T$, V
        Set oJS = CreateObject("MSScriptControl.ScriptControl")
            oJS.Language = "JScript"
            oJS.AddCode "function decode(txt) {return decodeURIComponent(txt);}"
        Set oDoc = CreateObject("htmlfile")
        Set oReq = CreateObject("Msxml2.XMLHTTP")
       V = Application.Caller
       B = Not IsError(V)
    If B Then
       F = Sheet1.Shapes(V).Fill.ForeColor.RGB
      With Sheet1.Shapes(V).TextFrame.Characters:  T = .Text:  .Text = "Downloading  links …":  End With
    End If
    With Sheet1.[A4].CurrentRegion
        .Columns("C:G").Offset(1).ClearContents
        R = .Rows(.Rows.Count).Row
    End With
    For R = 5 To R
             If B Then Sheet1.Shapes(V).Fill.ForeColor.SchemeColor = 24 + (R And 1): DoEvents
        With oReq
            .Open "GET", URL & Sheet1.Cells(R, 2).Text, False
            .setRequestHeader "DNT", "1"
             On Error Resume Next
            .send
             On Error GoTo 0
            If .Status <> 200 Then Beep: Debug.Print "Row #" & R; " :"; .Status; " " & .StatusText: GoTo Fin
            oDoc.body.innerHTML = .responseText
        End With
                C = 2
        For Each oElt In oDoc.all.ires.getElementsByTagName("H3")
            With oElt.getElementsByTagName("A")
              If .Length Then
                C = C + 1
                S = Split(.Item(0).href, "/url?url=")
                If UBound(S) > 0 Then S(0) = Split(S(1), "&rct=")(0)
                Sheet1.Hyperlinks.Add Sheet1.Cells(R, C), oJS.Run("decode", S(0)), , , Split(S(0), "/")(2)
                If C = 7 Then Exit For
              End If
            End With
        Next
    Next
Fin:
        Set oJS = Nothing:  Set oDoc = Nothing:  Set oReq = Nothing:  Set oElt = Nothing
    If B Then
        With Sheet1.Shapes(V):  .TextFrame.Characters.Text = T:  .Fill.ForeColor.RGB = F:  End With
        B = False
    End If
End Sub
 
Hi Marc:

Both your codes are working fine, but I wanted a small change to the code instead of displaying actual URL in the worksheets.

For eg:

When we are searching for “ABC Industries” AND protest, we grab the link (www.revenue.state.il.us) on to the spreadsheet, but instead of this can we grab the heading displayed on the search results in this example it will be
[PDF]IT 02-17 - Illinois Department of Revenue. In short, I want the header to be displayed and hyper linked instead of the URL.

This is what I understood the change should be:

header=Replace(link.innerHTML,"<EM>","")
header=Replace(header,"</EM>","")

where link is oElt.getElementsByTagName("A")


Thanks
 
Hi !

Or via innerText property at parent level (h3) or link level (a),
in red mod for Demo and DemoRequest2 procedures :

Sheet1.Hyperlinks.Add Sheet1.Cells(R, C), oJS.Run("decode", S(0)), , , oElt.innerText
 
How about this one. It will fetch 5 results from google including links to the results whatever your input may be.
Code:
Sub GoogleSearch()
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim posts As Object, Myval As Variant
Dim url As String

Myval = InputBox("Give me any input")
url = "https://www.google.co.in/search?q=" & Myval
With http
    .Open "GET", url, False
    .setRequestHeader "Content-Type", "text/xml"
    .send
    html.body.innerHTML = .responseText
End With
    Worksheets.Add
    ActiveSheet.Name = Myval
    Range("A1").Select
    Range("A1") = Myval
Set posts = html.getElementById("rso").getElementsByTagName("H3")
    For i = 0 To 5
        ActiveCell.Offset(0, 1) = posts(i).getElementsByTagName("a")(0).innerText
        ActiveCell.Offset(0, 2) = posts(i).getElementsByTagName("a")(0).href
        ActiveCell.Offset(1, 0).Select
    Next i
End Sub
 
Last edited:
If the code breaks for some reason, make sure to put "On error resume next" before starting the for loop. That's it. Now it will definitely rock.
 
Hello.
I tried the macro but it did not go. Failure to fill out.
Since I'm not expert you might attach an excel file with your prepared macro?
Thank you
See you soon
 
A file has been attached to examine the results on basis of google search.
 

Attachments

  • Google Search.xlsm
    19 KB · Views: 76
Hi, I tried the macro. Only extract the first page results.
Is it also possible to extract the results of all other pages?

Then can not the mask appear, pointing to all the search keys in the excel sheet column (column a)?
Thank you
 
It parses the first ten results from google search. See the title of this thread. It was meant to parse 5 results when I have created it. Thanks.
 
@stefanoste78,
It is very close to what you are expecting. It gets the input and parses the results displayed across pages. A file with this macro has been attached.

Code:
Sub GoogleSearch()
Const link = "https://www.google.co.in"
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim htm As New HTMLDocument
Dim topics As Object, topic As Object, item As Object
Dim posts As Object, pagination As Object, Myval As Variant
Dim url As String

Myval = InputBox("Give me any input")
url = "https://www.google.co.in/search?q=" & Myval
    With http
        .Open "GET", url, False
        .setRequestHeader "Content-Type", "text/xml"
        .send
        html.body.innerHTML = .responseText
    End With
    ActiveSheet.Name = Myval
    Range("A1").Select
    On Error Resume Next
    Set topics = html.getElementById("rso").getElementsByTagName("H3")
    For Each topic In topics
        ActiveCell = topic.getElementsByTagName("a")(0).innerText
        ActiveCell.Offset(0, 1) = topic.getElementsByTagName("a")(0).href
        ActiveCell.Offset(1, 0).Select
    Next topic

    Set pagination = html.getElementsByClassName("fl")
    For v = 0 To pagination.Length - IIf(pagination.Length > 0, 1, 0)
    If InStr(pagination(v).href, "about:") > 0 Then
    zz = link & Split(pagination(v).href, "about:")(1)
   
    End If
    With http
        .Open "GET", zz, False
        .setRequestHeader "Content-Type", "text/xml"
        .send
        htm.body.innerHTML = .responseText
    End With
    Set posts = htm.getElementById("rso").getElementsByTagName("H3")
    For Each post In posts
        ActiveCell = post.getElementsByTagName("a")(0).innerText
        ActiveCell.Offset(0, 1) = post.getElementsByTagName("a")(0).href
        ActiveCell.Offset(1, 0).Select
    Next post
Next v
End Sub

Don't forget to give it a thumbs up, if it satiates your need. Thanks.
 

Attachments

  • Google Search.xlsm
    20.4 KB · Views: 74
Hello.
How would the macro become if I want to search more and more key search (Obtained from the macro in column c) and enter the results in the results sheet. When the column "a" of sheet results1 is filled, the macro must continue to paste the data on the result sheet 2 and so on until the job is finished.
I attach the file
 

Attachments

  • research google.xlsm
    17.3 KB · Views: 66
Hi there! I just saw this code and it's awesome, works great!
I was wondering if there's a way I can get number of results in google.com, and the number of result from a determined location. Is it possible?
Thank you!
 
Hi guys, sorry ti but in but I'm trying to help someone with the sample file Nebu Google 5 first .xlsb
When I run the search I get www.Google.co.in instead of the results that were already present.
Could somebody please explain what this means and why this could happen?
Thanks in advance
 
As you can see this code dates from two years ago and since, Google has changed the coding of its webpage,​
this kind of code lasts until the webpage changes, the reason why people needing such process must learn​
how works a particular webpage - as from a webpage to another one it's very not the same - and​
foresee / search within its html code which elements a procedure may grab.​
As always Google is not friendly to use …​
 
A point to remember : an element existing in the webpage code (directly from a webbrowser) may not exist​
in the responseText of a request via MSXML2.XMLHTTP, WinHttp, ServerXMLHTTP60 (last both better), whatever …​
So once an element is located in the source webpage code, it's smart to check if the same element exists​
in the request response via Instr VBA function for example before to code forward avoiding to waste time.​
When it's not possible under request, it's time to pilot IE …​
 
Thank you Marc,
I'll see, I'm not that much into web scraping but hope the friend can solve it.
I would really have to dig into it , but I'll head your tip and see if I can do something with the existing 'old' code :)
 
Status
Not open for further replies.
Back
Top