• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to extract First 5 URL to a spreadsheet from a Google search result?

Status
Not open for further replies.

Nebu

Excel Ninja
Hi:

The following macro will search for strings on the column B of the attached workbook in Google and will display the search results in separate explorer tabs for each search string.

Code:
Sub test()
Application.ScreenUpdating = False

Dim IE As Object

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

For i& = 2 To Sheet1.Cells(Rows.Count, 2).End(xlUp).Row

While Sheet1.Range("B" & i) <> vbNullString
    IE.Navigate "https://www.google.co.in/search?q=" & Sheet1.Range("B" & i), CLng(2048)

    While IE.Busy
        DoEvents
    Wend

    i = i + 1
Wend

Next
Set IE = Nothing

Application.ScreenUpdating = True
End Sub

Now I want to extract the URLs for the first 5 search results to a spreadsheet from each tab in the explorer. I am sure there is a way to achieve this, but not quite sure how. I am attaching a sample file with the above macro.

Thanks
Nebu
 

Attachments

  • Book2.xlsm
    20.1 KB · Views: 94
Hi Marc:

For example:

The first search string is “ABC Industries” AND fraud" will give the first 5 Google search result as attached. I am looking for a code to capture the URLs for all the search strings in the format given as attached.

Thanks

upload_2016-11-18_13-43-46.png
upload_2016-11-18_13-37-50.png
 
A first attempt with results in Immediate window (OK under Seven & IE9) :​
Code:
Sub Demo0()
    With CreateObject("InternetExplorer.Application")
        .Visible = True
    For R& = 2 To Sheet1.Cells(Rows.Count, 2).End(xlUp).Row
        .Navigate "https://www.google.co.in/search?q=" & Sheet1.Cells(R, 2).Text
        While .Busy Or .ReadyState < 4:  DoEvents:  Wend
         With .Document.all.ires.getElementsByTagName("A")
             For U& = 0 To Application.Min(8, .Length - 1) Step 2
                 Debug.Print "' "; Format(R, "@@"); Tab(7); .Item(U).href
             Next
         End With
    Next
        .Quit
    End With
End Sub
Better is to not pilot any webbrowser but often faster
is to reproduce requests used by webbrowsers. I'll be back later …
 
Hi Marc:

This looks really good. Thanks a lot for the quick turnaround,I modified your code a bit to take the URLs to the spread sheet.

Code:
Sub Demo0()
Application.ScreenUpdating = False
    With CreateObject("InternetExplorer.Application")
        .Visible = True
    For R = 5 To Sheet1.Cells(Rows.Count, 2).End(xlUp).Row
        .Navigate "https://www.google.co.in/search?q=" & Sheet1.Cells(R, 2).Text
        While .Busy Or .ReadyState < 4:  DoEvents:  Wend
        With .Document.all.ires.getElementsByTagName("A")
                c = 3
            For U = 0 To Application.Min(8, .Length - 1) Step 2
                Sheet1.Cells(R, c) = .Item(U).href
                c = c + 1
            Next
        End With
    Next
        .Quit
    End With
Application.ScreenUpdating = True
End Sub

Is there any way to keep the hyperlink for the copied URLs , so that if we click on it from the spreadsheet it will take you to the respective web pages ( or Am I becoming too greedy?). Here is how the file looks like if I run the modified code.

Thanks
 

Attachments

  • Book2.xlsm
    22.6 KB · Views: 159
For links, it's the same as manually using Macro Recorder …

You can start to play with this :​
Code:
Private Sub Demo()
    Static B As Boolean
        If B Then Exit Sub
  Const PREF = "://www.google.co.in/url?url=", URL = "https://www.google.co.in/search?q="
    Dim oJS As Object, oElt As Object, C&, F&, R&, S$(), T$, V
    Set oJS = CreateObject("MSScriptControl.ScriptControl")
        oJS.Language = "JScript"
        oJS.AddCode "function decode(txt) {return decodeURIComponent(txt);}"
       V = Application.Caller
       B = Not IsError(V)
    If B Then
      With Sheet1.Shapes(V).Fill.ForeColor:  F = .RGB:  .SchemeColor = 24:  End With
      With Sheet1.Shapes(V).TextFrame.Characters:  T = .Text:  .Text = "Downloading  links …":  End With
    End If
    With Sheet1.[A4].CurrentRegion
        .Columns("C:G").Offset(1).ClearContents
        R = .Rows(.Rows.Count).Row
    End With
    With CreateObject("InternetExplorer.Application")
        On Error GoTo Fin
'        .Visible = True
     For R = 5 To R
        If B Then Sheet1.Shapes(V).Fill.ForeColor.SchemeColor = 24 + (R And 1)
        .Navigate URL & Sheet1.Cells(R, 2).Text
        While .Busy Or .ReadyState < 4:  DoEvents:  Wend
            C = 2
        For Each oElt In .Document.all.ires.getElementsByTagName("H3")
            C = C + 1
            S = Split(oElt.Children(0).href, PREF)
            If UBound(S) > 0 Then S(0) = Split(S(1), "&rct=")(0)
            Sheet1.Hyperlinks.Add Sheet1.Cells(R, C), oJS.Run("decode", S(0)), , , Split(S(0), "/")(2)
            If C = 7 Then Exit For
        Next
    Next
Fin:
        If Err.Number Then Beep
        .Quit
    End With
        Set oJS = Nothing:  Set oElt = Nothing
    If B Then
        With Sheet1.Shapes(V):  .TextFrame.Characters.Text = T:  .Fill.ForeColor.RGB = F:  End With
        B = False
    End If
End Sub
You may Like it ! ;)

Edit for tweak … (v3 !)​
 
Last edited:
@Marc L

Better is to not pilot any webbrowser but often faster
is to reproduce requests used by webbrowsers. I'll be back later …

Most of the time I agree with you on this. But for Google Search specifically, it's safer to use IE.

Google has set limit on number of times xmlhttp connections can be made in a period of time. And it can generate unexpected results for multiple queries.

Though you can cycle through ".ServerXMLHTTP" and ".3.0" to ".6.0" to circumvent limit up to a point.
 
I've just done a last tweak but nothing dramatic !

As I do not use (avoid) Google as search engine …

Yes you're right Chihiro and
it could be the same for financial sites for example.
 

A new mod in post #6 including a JScript function to decode links.

Without, some hyperlinks may not work …​
 
Hi Marc:

Thanks for this its working fine barring the search string “ABC Industries” AND protest", after giving 2 URL it is exiting the macro giving the error number 438. I guess the error is because the third link in the search result for this particular string is a pdf file.

Thanks
 
Hi !

No issue on my side : see results in this attachment :​
 

Attachments

  • Nebu Google 5 first .xlsb
    23.8 KB · Views: 174
Hi:

I am not sure why it is happening at my end? May be it is something to do with my settings here. I am using my office computer and do have any admin rights. The file you have attached is also not working at my end. I guess its fine I can live with this. Thanks again for your support.

Thanks
 

As we do not have same links result (no pdf link on my side),
give me complete original link and on which codeline error occurs …

Under Windows 10 ?
 

Yes. Sorry, I had it in my attachment but in link #2
but on my side with a google prefix within original link …

So as I have no issue, on which codeline error occurs ?
 
Hi:

When it come to the line

S = Split(oElt.Children(0).href, PREF)

it jumps to the error handler
If Err.Number Then Beep

and execute rest of the code and exit the subroutine.

Thanks
 
So it seems on your side an element "H3" has no children
or no property href in its fisrt children …

Put in comment codeline On Error Goto Fin, run procedure
and once error occurs, explore in Locals window oElt variable
to check children and href property.

If I'm right, maybe I could do a bypass but in this case
the link would not be grabbed …
 

Under which Windows and IE version ?
href belongs directly to oElt or to oElt.Children ?

On my side : oElt - children - Item 1 - href …
 
Hi:

On my side : oElt - children - Item 2 - href …
and my operating system is windows 7 professional

Thanks
 
There is a newer attachment in post #11.

Now you must see first link is the pdf file and link #4 is upon this thread !
Google 'bot was here …

Click the Search button to see if my mod works on your side …
 
Status
Not open for further replies.
Back
Top