• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Alternative way instead of piloting IE

YasserKhalil

Well-Known Member
Hello everyone
I am still trying to learn about web scraping .. I have written the following code
Code:
Sub SearchBot()
    Dim ie          As InternetExplorer
    Dim ele        As HTMLLinkElement
    Dim ws          As Worksheet
    Dim str        As String
    Dim c          As Long
    Dim r          As Long

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set ie = New InternetExplorer
    ie.Visible = False

    For r = 2 To ws.Cells(Rows.Count, 1).End(xlUp).Row
        ie.navigate "http://www.tsbpa.state.tx.us/php/fpl/indlookup.php"
        Do While ie.Busy = True Or ie.readyState <> 4: DoEvents: Loop

        ie.document.getElementById("idlname").Value = ws.Range("A" & r).Value
        ie.document.getElementById("idfname").Value = ws.Range("B" & r).Value

        ie.document.form1.submit.Click
        c = 3
        Do While ie.Busy = True Or ie.readyState <> 4: DoEvents: Loop

        On Error Resume Next
            For Each ele In ie.document.getElementById("results").getElementsByTagName("tr")
                ws.Cells(r, c).Value = ele.Children(0).textContent
                ws.Cells(r, c + 1).Value = ele.Children(1).textContent
                ws.Cells(r, c + 2).Value = ele.Children(2).textContent
                ws.Cells(r, c + 3).Value = ele.Children(3).textContent
                ws.Cells(r, c + 4).Value = ele.Children(4).textContent
   
                c = c + 5
            Next ele
        On Error GoTo 0
    Next r
   
    ws.Columns("C:AZ").AutoFit
    MsgBox "Done...", 64
End Sub

But of course took too much time. Is there alternative way to do that faster using MSXML method for example?
 
Use developer tool in Chrome or other browser.

Trace what request is sent to get result.

upload_2018-1-11_10-40-5.png

Use that string to construct Post request.

You'll likely need some of the headers to be set for request as well.

There's plenty of examples found in the forum.
 
Since I was bored... here's sample.
Note postdata string, and as well, headers set in request.

Code:
Sub Demo()
Dim xml As New XMLHTTP60, html As New HTMLDocument
Dim cel As Range
Dim lname As String, fname As String

For Each cel In Range("A2:A" & Cells(Rows.Count, 1).End(xlUp).Row)
    lname = UCase(cel.Value)
    fname = UCase(cel.Offset(, 1).Value)
    postdata = "LICID=&LNAME=" & lname & "&FNAME=" & fname & "&CLNAME=&CITY=&CNTY=&STATE=&ZIP=&submit=Submit+Search&tsbpa5a57b1ff7cf4c=tsbpa5a57b1ff7cf4c&list=fromsel"
    With xml
        .Open "POST", "http://www.tsbpa.state.tx.us/php/fpl/indlookup.php", False
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        .setRequestHeader "Cookie", "ZDEDebuggerPresent=php,phtml,php3; PHPSESSID=n13humaldghjea226msdmnuclt1aavfl"
        .send postdata
        html.body.innerHTML = .responseText
        Call ExportText("C:\Test\Texas" & lname & ".txt", .responseText)
    End With
Next
End Sub

Sub ExportText(fPath As String, xStr As String)
Dim intFF As Integer: intFF = FreeFile()
Open fPath For Output As #intFF
Print #intFF, xStr
Close #intFF
End Sub
This will export response to text file (for studying structure).

Other than that, you don't need to call ExportText sub.
You can just traverse html as you did ie.document in your code.
 
Last edited:
@sir Chihiro, best demo by far. I wanted to make any script in vba using "cookie". You have already taught me. However, ain't it a better idea to add "User-Agent" as well in the header parameter? I may ask you a question concerning cookies If i don't find it myself. Thanks a lot sir, for this superb script.
 
You can and probably should use user agent. Though it really depends on site. Some sites don't care, but many do. It's just playing nice if you do send it.

Call ExportText is sub for exporting result to text file. Just the standard method using freefile().
 
Thanks a lot for all great replies and forgive me for being late in reply
I tried to make use of the code in that way (but it seems that it is no use)
Code:
Sub Demo()
Dim xml As New XMLHTTP60, html As New HTMLDocument
Dim cel As Range
Dim lname As String, fname As String
Dim ws As Worksheet

Set ws = ThisWorkbook.Worksheets("Sheet1")

'For Each cel In Range("A2:A" & Cells(Rows.Count, 1).End(xlUp).Row)
For r = 2 To ws.Cells(Rows.Count, 1).End(xlUp).Row

    lname = UCase(ws.Cells(r, 1).Value)
    fname = UCase(ws.Cells(r, 2).Value)
    'LICID=&LNAME=NEWBERRY&FNAME=KAYE&CLNAME=&CITY=&CNTY=&STATE=&ZIP=&submit=Submit+Search&tsbpa5a5a6b3f7ec0d=tsbpa5a5a6b3f7ec0d&list=fromsel
    postdata = "LICID=&LNAME=" & lname & "&FNAME=" & fname & "&CLNAME=&CITY=&CNTY=&STATE=&ZIP=&submit=Submit+Search&tsbpa5a57b1ff7cf4c=tsbpa5a57b1ff7cf4c&list=fromsel"
    With xml
        .Open "POST", "http://www.tsbpa.state.tx.us/php/fpl/indlookup.php", False
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
                                    'ZDEDebuggerPresent=php,phtml,php3; PHPSESSID=mtjbn13akcrsr8it62q7hjem1s1tm44t
        .setRequestHeader "Cookie", "ZDEDebuggerPresent=php,phtml,php3; PHPSESSID=n13humaldghjea226msdmnuclt1aavfl"
        .send postdata
        html.body.innerHTML = .responseText
        c = 3
       
        'On Error Resume Next
            For Each ele In .responseText.getElementById("results").getElementsByTagName("tr")
                ws.Cells(r, c).Value = ele.Children(0).textContent
                ws.Cells(r, c + 1).Value = ele.Children(1).textContent
                ws.Cells(r, c + 2).Value = ele.Children(2).textContent
                ws.Cells(r, c + 3).Value = ele.Children(3).textContent
                ws.Cells(r, c + 4).Value = ele.Children(4).textContent
   
                c = c + 5
            Next ele
        'On Error GoTo 0
        'Call ExportText(ThisWorkbook.Path & "\" & lname & ".txt", .responseText)
    End With
Next
End Sub

As an example Shahin : first name "Newman" and last name "Michael"
 
Yes, right your are. This time I found the valid response. Let's see how we can deal with this.
 
Here is your working code. "User-Agent" is playing a very important role to fetch the result. If you pass it in your request header then it looks more like a human being doing the search.

Code:
Sub Web_Data()
    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim posts As Object, elem As Object, trow As Object
    Dim poststr As String
  
    poststr = "LICID=&LNAME=NEWMAN&FNAME=MICHAEL&CLNAME=&CITY=&CNTY=&STATE=&ZIP=&submit=Submit+Search&tsbpa5a5a713dd8e59=tsbpa5a5a713dd8e59&list=fromsel"
  
    With http
        .Open "POST", "http://www.tsbpa.state.tx.us/php/fpl/indlookup.php", False
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send poststr
        html.body.innerHTML = .responseText
    End With
  
    Set posts = html.getElementById("results")
  
    For Each elem In posts.Rows
        For Each trow In elem.Cells
            y = y + 1: Cells(x + 1, y) = trow.innerText
        Next trow
        y = 0
        x = x + 1
    Next elem
End Sub

I think the rest of the housekeeping stuff you can manage. However, if you get stuck, let me know.
 
Thank you very very much Shahin for this wonderful and great code
I tried to adapt it like that but I got nothing but error and can't spot why
Code:
Sub TSBPA_Search_Bot_Using_Request_Header()
    Dim http As New XMLHTTP60
    Dim html As New HTMLDocument
    Dim ws As Worksheet
    Dim posts As Object
    Dim elem As Object
    Dim trow As Object
    Dim poststr As String
    Dim r As Long
    Dim c As Long

    Set ws = ThisWorkbook.Worksheets("Sheet1")

    For r = 2 To ws.Cells(Rows.Count, 1).End(xlUp).Row
        poststr = "LICID=&LNAME=" & UCase(ws.Cells(r, 1).Value) & "&FNAME=" & UCase(ws.Cells(r, 2).Value) & "&CLNAME=&CITY=&CNTY=&STATE=&ZIP=&submit=Submit+Search&tsbpa5a5a713dd8e59=tsbpa5a5a713dd8e59&list=fromsel"

        With http
            .Open "POST", "http://www.tsbpa.state.tx.us/php/fpl/indlookup.php", False
            .setRequestHeader "Content-Type", "application/r-www-form-urlencoded"
            .setRequestHeader "User-Agent", "Mozilla/5.0"
            .send poststr
            html.body.innerHTML = .responseText
        End With

        Set posts = html.getElementById("results")
        c = 2

        'If Not elem Is Nothing Then
        For Each elem In posts.Rows
            For Each trow In elem.Cells
                c = c + 1
                ws.Cells(r, c) = trow.innerText
            Next trow
        Next elem
        'End If
    Next r
End Sub
 
Hi there!! Why do you think it's not working? I have tried to create a loop using the same documents several times to give a demo. Find the attachment. If it misses to fetch the results for the first time don't get upset, try few more times. Perhaps you remember that I was not having the result for the first time when I manually searched for it yesterday. I had to try three or more times to get the result.
 

Attachments

  • Demo.xlsm
    27.6 KB · Views: 5
I've tried several times and found no issues. After opening the workbook when you execute the macro, It breaks for the first time or at best for the second time but then it runs smoothly.
 
Thank you very much for great efforts Shahin
I like your demo a lot .. and it is working as you said after several tries
But after some modifications I didn't get any results .. Please can you have a look at this attachment and spot the cause of not working for me .. I have tried a lot and didn't bring any results at all
 

Attachments

  • TSBPA Search Bot.xlsm
    26.1 KB · Views: 1
I checked your macro. It's happening because of the way you defined the loop. I always get frightened to work with loops. When it comes to work with core vba functionality, you never see me there. However, I'm attaching the demo with a slight modification I have brought about. You only need to fix the loop for the commented out portion. It's fetching the results, though. Once again, to get the most out of it, you need to run it few times. Hope it helps.
 

Attachments

  • Demo.xlsm
    28.2 KB · Views: 4
If you press the "HIT" button for the first time, it brings the result for a single search. However, if you press it second or third time you will get all the results in the sheet.
 
To see the results you can try using the names replacing the existing names in the macro I have uploaded.
Code:
Newman Michael
Newberry Kaye
Newberry Kaye
Aijaz Mohammad

Btw, I could not find any right match "Aijaz Mohammad" for this result even when I tried manually.
 
As for the name "Aijaz" doesn't exist but put to test the code ... I can't get results .. The first demo is working to some extent but when trying to actual data it fails
 
This is the result I'm getting against those search.
 

Attachments

  • Untitled.jpg
    Untitled.jpg
    60.6 KB · Views: 8
Thanks a lot for your great efforts. May be the problem of the slow internet connection
I will try later .. may be will be solved if the connection is well
 
Back
Top