• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Problem grabbing address from a webpage stored within an envelope

shahin

Active Member
Hi there! Today I found a new problem that i can't resolve myself. Following this link "https://ucf.uscourts.gov/search?Cou...tName=&DebtorFirstName=&Amount=500&EnteredOn=" you will find a webpage where within an envelope the address of each name next to it is stored. The address only appears if the envelope is clicked. I would like to parse it but can't. Hope somebody out there help me grab that. Thanks. Here is what I tried with:
Code:
Sub BankruptcyAddress()
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim posts As Object, link As Object

With http
    .Open "GET", "https://ucf.uscourts.gov/search?Court=insb&CreditorLastName=&CreditorFirstName=&CaseNumber=&DebtorLastName=&DebtorFirstName=&Amount=500&EnteredOn=", False
    .send
    html.body.innerHTML = .responseText
End With
Set posts = html.getElementsByClassName("modal-body")
    For Each link In posts
        x = x + 1
        Cells(x, 1) = link.getElementsByTagName("span")(1).innertext
    Next link
    Set http = Nothing: Set html = Nothing: Set posts = Nothing
End Sub
 
You'd need to read below id from initial response.
Code:
 <td><button type='button' id='4e76df02-e140-11e5-b893-8300785958ac' class='btn btn-default btn-sm address' data-toggle='modal'  data-target='#modalAddress'><span class='glyphicon glyphicon-envelope'></span></button></td>

Then use GET with below detail.
upload_2017-3-31_12-9-19.png

Then parse JSON response or do string manipulation on it.
Code:
{"odata.metadata":"https://ucf.uscourts.gov/OData.svc/$metadata#Creditors/@Element","Key":"4e76df02-e140-11e5-b893-8300785958ac","LastName":"1ST NATIONAL BANK OF CHICAGO","FirstName":"","MiddleName":"","Street1":"18 E WILSON","Street2":"","Street3":"","City":"BATAVIA","State":"IL","ZipCode":"60510","Country":null,"CmecfId":36689,"CmecfStartDate":null,"CmecfEndDate":null,"AutoUpdate":false,"UniqueId":null}
 
Thanks sir Chihiro, for your response. Perhaps, I should be more attentive about how initial response acts. However, it seems tedious and slightly complicated to understand what is going on when response gets in and how developer tools catch and display it. Btw, Is it that difficult to parse Json data? Thanks again.
 
Hi !

A tip : with this envelope request without specific JSon request header,
you can grab it as xml, if you know xml data within Excel …

As Chihiro wrote, whatever the format, JSon or xml,
it is just about parsing text data !

So same classic ways :
• via VBA text functions
• via a JScript object (or xml)
Using directly "ScriptControl" ActiveX is lighter than VBA-JSON tool …
 
One more question: Is it possible to scrape e-mail addresses from a webpage using vba specially when it is hidden?
 
It is worth mentioning that although the page contains 104 pagination but while parsed it always gives 6 href as result.
Code:
Sub Bankruptcy()
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim links As Object, link As Object, posts As Object

With http
    .Open "GET", "https://ucf.uscourts.gov/search?Court=insb&CreditorLastName=&CreditorFirstName=&CaseNumber=&DebtorLastName=&DebtorFirstName=&Amount=500&EnteredOn=", False
    .send
    html.body.innerHTML = .responseText
End With
Set links = html.getElementsByClassName("pagination pagination-md pull-right")(0)
Set link = links.getElementsByTagName("li")
    For Each post In link
    x = x + 1
    Cells(x, 1) = post.getElementsByTagName("a")(0).href
    Next post
Set http = Nothing: Set html = Nothing: Set links = Nothing: Set link = Nothing
End Sub
 

Attachments

  • Untitled.jpg
    Untitled.jpg
    141.2 KB · Views: 7
One more question: Is it possible to scrape e-mail addresses from a webpage using vba specially when it is hidden?
Unclear but all you can reach manually should be scraped …
It is worth mentioning that although the page contains 104 pagination but while parsed it always gives 6 href as result.
Important is how to scrape a page and
to detect how many pages or at least last page …
 
Hi Marc L, thanks for your response. Perhaps You meant what I've already written. But, I want to start from the first page and then selecting any item from the dropdown menu i will go to that page where pagination options belong. That is why I tried the link of all href of that pagination so that it can be automated, otherwise what i write is serving the purpose.
Code:
Sub USBankruptcy()
Dim topics As Object, topic As Object, post As Object
x = 3
For z = 1 To 266
With CreateObject("MSXML2.serverXMLHTTP")
    .Open "GET", "https://ucf.uscourts.gov/search?Court=alnb&CreditorLastName=&CreditorFirstName=&CaseNumber=&DebtorLastName=&DebtorFirstName=&Amount=0&EnteredOn=&page=" & z, False
    .send
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = .responseText
End With
Set topics = html.getElementsByTagName("tbody")(0)
    For Each topic In topics.Rows
        For Each post In topic.Cells
        y = y + 1
        Cells(x, y) = post.innerText
        Next post
        y = 0
        x = x + 1
    Next topic
Next z
Set topics = Nothing
Range("B3").CurrentRegion.Borders.LineStyle = xlContinuous
End Sub

Btw, pasting any code before you I get frightened cause I use topics, topic etc meaningless stuffs as variable to save time. Don't get me wrong.
 
To make things clear about scraping e-mail address all i wanna say is when i get to that specific web i can only see the send E-mail button but inspecting the element i see whole lot of crap other than any E-mail address. However, the way I searched perhaps was not the efficient one. It might be stored some other place where i fail to notice.
 
If the purpose is still to grab address so it is yet in Chihiro's post #2 !
You must apply a specific request for each envelope button ID
as you can see within your webbrowser inner inspector tool
and in network area when you click on an envelope :
just compare request URL and envelope button ID …
 
I use topics, topic etc meaningless stuffs as variable to save time.
On a local forum, I saw the same code with topics, posts, topic, …
With exactly same issues like yours in all your threads in this forum.
I don't know where you grab it but I wonder if its purpose
was well understood or just a bad copy / paste as it is very not coding …
 
Back
Top