shahin
Active Member
I have written some code using IE to parse some email addresses from google maps (seriously complicated one). There is no way to get a pass if hardcoded delay is applied to serve the purpose. I'm really enthralled to see how it works. Even if for python + selenium it is quite difficult to go with but in case of IE, I found it very helpful because the "Explicit Wait" defined within IE is just incredible. I thought to demonstrate this piece of code to let the fellow programmer know IE is no less than any scripting language when It comes to deal with web-scraping.
Just run the code and see the magic (try to overlook the way I named the variables):
Reference to add to the library:
1. Microsoft Internet Controls
2. Microsoft HTML Object Library
Just run the code and see the magic (try to overlook the way I named the variables):
Code:
Sub Fetch_Google_Data()
URL$ = "https://www.google.com/maps/d/u/0/embed?mid=1PCGcGIVnNMtDQtEcWDqBx-G-iT8&ll=32.444813638299706%2C-87.71010306562499&z=8"
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim posts As Object, post As Object, elem As Object
Dim topics As Object, topic As Object
With IE
.Visible = True
.navigate URL
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set HTML = .Document
End With
Do: Set topics = HTML.getElementsByClassName("i4ewOd-pzNkMb-ornU0b-b0t70b-Bz112c")(0): DoEvents: Loop While topics Is Nothing
topics.Click
Do: Set topic = HTML.querySelectorAll("div[role='checkbox']")(1): DoEvents: Loop While topic Is Nothing
topic.Click
With HTML.querySelectorAll(".HzV7m-pbTTYe-JNdkSc .suEOdc")
For i = 1 To .Length - 1
.Item(i).Click
If Not HTML.querySelector("a[href^='mailto:']") Is Nothing Then
r = r + 1: Cells(r, 1) = HTML.querySelector("a[href^='mailto:']").innerText
End If
Do: Set posts = HTML.querySelector(".HzV7m-tJHJj-LgbsSe-Bz112c.qqvbed-a4fUwd-LgbsSe-Bz112c"): DoEvents: Loop While posts Is Nothing
posts.Click
Do: Set elem = HTML.querySelector(".qqvbed-p83tee"): DoEvents: Loop While elem Is Nothing
Next i
End With
End Sub
Reference to add to the library:
1. Microsoft Internet Controls
2. Microsoft HTML Object Library