shahin
Active Member
I've written a script to automate some information from a webpage. My script is doing fine if I don't bother about duplicate leads. The script is able to go down a certain level of that page managing lazy-load and parse the information (company name and industry) from there. The only issue I would like to deal with is kick out duplicate leads. Is there any way I can shake off duplicate entries on the fly? Thanks in advance.
Here is my attempt so far:
This is the partial output I'm having at this moment (the page does not contain any duplicate leads ,though):
Here is my attempt so far:
Code:
Sub Handle_SlowLoad()
URL$ = "https://www.inc.com/profile/sumup-payments-limited"
Dim post As Object, container As Object, scroll As Long
With CreateObject("InternetExplorer.Application")
.Visible = True
.navigate URL
While .Busy = True Or .readyState < 4: DoEvents: Wend
For scroll = 1 To 10
Set container = .document.getElementsByClassName("profile")
.document.parentWindow.scrollBy 0, 99999
Application.Wait Now + TimeValue("00:00:003")
Next scroll
For Each post In container
With post.getElementsByTagName("h1")
If .Length Then r = r + 1: Cells(r, 1) = .Item(0).innerText
End With
With post.getElementsByClassName("ifi_industry")(0).getElementsByTagName("dd")
If .Length Then Cells(r, 2) = .Item(0).innerText
End With
Next post
.Quit
End With
End Sub
This is the partial output I'm having at this moment (the page does not contain any duplicate leads ,though):
Code:
Sumup Payments Limited IT Services
Sumup Payments Limited IT Services
Sumup Payments Limited IT Services
Restel Fast Food Oy Travel & Hospitality
Restel Fast Food Oy Travel & Hospitality