• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How can I parse phone numbers using string manipulation from a webpage?

Status
Not open for further replies.

shahin

Active Member
As I was told that If i manipulate string on the javascript encrypted html stuff, then i can parse certain things i would like to have. This site i'm dealing with now contains javascript wrapped html elements. I would like to parse the phone numbers from that page as it is easy to prepare the ground comparatively. I didn't use query string as I noticed that without using it the desired page can be located with the url. I was trying like this:

Btw, FYC the search city is: Saskatoon

Code:
Sub test()
    Dim http As New XMLHTTP60
    Dim str As Variant
 
With http
    .Open "GET", "https://www.saskatoonrealtors.ca/web/SRAR/Find_A_REALTOR/SRAR/Find_an_Agent/Find_A_REALTOR_2.aspx?hkey=96d9da44-1d30-4c12-8c70-b3c495fe0610", False
    .setRequestHeader "Content-Type", "text/html; charset=utf-8"
    .send
End With

   On Error Resume Next
   str = Split(http.responseText, "<td class=""officeCell"">")
   y = UBound(str)
       For i = 1 To y
           Cells(x, 1) = Split(Split(str(i), "<div class=""ng-binding"">Phone:")(1), "</")(0)
       Next i
End Sub
The visible elements within which phone numbers are stored:
Code:
<td class="officeCell">
                    <h4 ng-click="selectItem(element)" class="ng-binding">2 Percent Realty Inc.</h4>
                    <div ng-bind-html="element.FullAddress | newlines" class="ng-binding">700 2010 11th Ave Regina, SK  S4P 0J3 CANADA</div>
                    <div class="ng-binding">Phone: (639) 739-0909</div>
                    <div ng-show="element.Fax" class="ng-binding">Fax: (888) 123-4567</div>
                    <div ng-show="element.BrokerName" class="ng-binding">Broker: Morris Tkachuk</div>
                    <div ng-show="element.EmailAddress"><a ng-href="mailto:info@2percentrealty.ca" class="ng-binding" href="mailto:info@2percentrealty.ca">info@2percentrealty.ca</a></div>
                    <a ng-show="element.Website" ng-href="http://" target="_blank" href="http://" class="ng-hide">Website</a>
                </td>
 
Last edited:

Hui

Excel Ninja
Staff member
Paul Kelly from Macro Mastery is doing a live Youtube Demo of a technique to achieve this tonight.
You can watch it for free at:

I'd suggest logging on and ask him questions as you go or send him some data/link prior to the presentation so he can consider it.
Email Paul at: paul@excelmacromastery.com
 

Chihiro

Excel Ninja
@shahin

You should output responseText to text file and inspect contents. You'll notice the information you are looking for isn't contained there.

You need following:
Code:
https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0&city=&companyName=&personName=
This will return response in format of JSON. You just need to parse it using string manipulation or other methods.
 

shahin

Active Member
@sir Chihiro,
Your every single bit of information on anything never fails to amaze me. I wonder, how did you get that url or what is the actual process to get the target url. Every time i work with a url it seemed to me so long that i was right but it turned out i was very wrong.
 

Chihiro

Excel Ninja
First launch Developer Tool.

Then check XHR (i.e. XMLHttpRequest) section, the site returns nothing there. So, you can surmise that info is likely from script.

Now, check "JS" section.

You can skip over anything that does not contain "?" in Name section. Since without it, query can't be made.

If you hover over each name, you can eliminate any that starts with Google API url. Since those are all used to fill the map.

When you skim through Name list, you will notice one that starts with "?" and also contains "address=...". This is likely candidate as it has parameters similar to search conditions on the site.

When you check the "Preview" of that entry. You can see there are 193 elements returned (0-192). This matches results returned by search on the site. Now just Ctrl-F in "Response" section for some name or phone# to verify.
 

shahin

Active Member
Thanks sir for the clarity. Ctrl+F is such a keyword I have been looking for since I started to learn the web element inspection stuff. Thanks a ton. One more thing sir: is it at all possible to scrape json data using string manipulation?
 

Chihiro

Excel Ninja
Yes. Json is like any other text/string. Just a bit more structured.
Ex.
":" denotes Category:Data pair. Each pair is delimited by ",".

You will need to study and test with actual response to build logic to extract information that you want.

Personally, I'd leverage VBScript.RegExp for this one. Using pattern to extract matches. You can read about using RegEx in VBA in link below.

http://analystcave.com/excel-regex-tutorial/
 

shahin

Active Member
Finally, I have been able to do it. Very close to what I wanted. Here is what I've tried with:
Code:
Sub Api_stuff()
Dim http As New XMLHTTP60, str As Variant
    With http
        .Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0&city=&companyName=&personName=", False
        .send
        str = Split(.responseText, "{""Id"":")
    End With
    x = UBound(str)
    On Error Resume Next
    For v = 1 To x
        Cells(v, 1) = Split(Split(str(v), "FullName"":")(1), ",""FirstName")(0)
        Cells(v, 2) = Split(Split(str(v), "Phone"":")(1), ",""Fax")(0)
    Next v
End Sub
Image of the partial result:
 

Attachments

Marc L

Excel Ninja
Hi !

As again it is nothing but text just needing an easy logic !
For example see this demonstration :​
Code:
Sub Demo1()
        Dim V, SP$(), N$, P$, R&
    With New XMLHTTP60
        .Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0", False
        .send
        For Each V In Split(.responseText, "{""Id"":")
                      SP = Split(V, "FullName"":""")
            If UBound(SP) Then
                   N = Split(SP(1), """")(0)
                If N > "" Then
                              SP = Split(V, """Phone"":""")
                    If UBound(SP) Then
                           P = Split(SP(1), """")(0)
                        If P > "" Then R = R + 1: Cells(R, 1).Resize(, 2).Value = Array(N, P)
                    End If
                End If
            End If
        Next
    End With
End Sub
Do you like it ? So thanks to click on bottom right Like !
 

shahin

Active Member
Thanks Marc L, for your robust and perfect solution. This is undoubtedly the best demo using string manipulation on json data I've ever come across. You are really the gem of a guy. So far, this type of stuff (I meant, json response) was a great barrier for me to overcome while scraping web data. Now, I can move one step ahead. I still find it difficult to understand thoroughly your code. I wish I could code like you. Thanks a ton. Btw, I heavily liked your answer.:)
 

Marc L

Excel Ninja
Split VBA inner function is the way to extract partial text
from a global text at "child level" …

For a JSon structure you can also use an object like in this thread
(see also post #47 using RegExp)
but whatever the way you must analyse text structure,
what is before and after text to extract, …
 

shahin

Active Member
@Marc L,
I was not talking about split function cause I can handle it now rather I was talking about your unconventional way of coding which brings a lot more prettier appearance, moreover you like to declare variable on the fly. Thanks again for the link you have provided above.
 

Marc L

Excel Ninja
My coding way just tries to respect object model and Logic
and here in post #10 variables are not declared on the fly !

The more analyzing before, the more easier is coding …
 

shahin

Active Member
That's right what you said here. My bad! But, In few instances I saw you write variables like [SP$(), N$, P$, R&] just before applying them. Merci pour tout.
 

Marc L

Excel Ninja

It's like As String or As Long as you can read in VBA inner help.
(a legacy from BASIC grandfather !)

As during execution you can check variables types in Locals window …
 
You could enter a file with the inserted form that extracts the url phone number placed on a column (as you have just done last time).
thank you
 

shahin

Active Member
And the complete parser should look like this if I ignore "On Error Resume Next":
Code:
Sub grabbing_jsondata()
Dim http As New XMLHTTP60, str As Variant
Range("A1:E1") = Array("Name", "Phone Number", "Fax", "Email Address", "Web Address")
With http
    .Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0&city=&companyName=&personName=", False
    .send
    str = Split(.responseText, "{""Id"":")
End With
x = UBound(str)
On Error Resume Next
For V = 1 To x
    Cells(V, 1) = Split(Split(str(V), "FullName"":""")(1), """")(0)
    Cells(V, 2) = Split(Split(str(V), "Phone"":""")(1), """")(0)
    Cells(V, 3) = Split(Split(str(V), "Fax"":""")(1), """")(0)
    Cells(V, 4) = Split(Split(str(V), "Email"":""")(1), """")(0)
    Cells(V, 5) = Split(Split(str(V), "WebSite"":""")(1), """")(0)
Next V
Columns("A:E").AutoFit
Set http = Nothing: Set str = Nothing
End Sub
 

Marc L

Excel Ninja
As Variant str variable is not an object so no needs to set it to Nothing

As the Json data has multi levels (companies, members of companies)
you could had some columns to avoid "blank" rows or phone # alone or
separating data in two sheets : one for companies and the other for members.
 

shahin

Active Member
@Marc L,
One last question on this: In several cases in your script or any script of someone else I noticed that the line always goes like the below code in case of scraping web data, I meant shouldn't it be ideal to use the line like [For v = 0 to x - 1]? I'm little weak in playing with arrays. Using the expression I mentioned here, I found that It misses the lastmost value to scrape. I knew that counting of an array always starts from 0 then what is wrong here? Thanks in advance.
Code:
x = UBound(str)
OnErrorResumeNext
For V = 1 To x
    Cells(V, 1) = Split(Split(str(V), "FullName"":""")(1), """")(0)
Next V
 
Last edited:

Chihiro

Excel Ninja
V represents row index in this case.
Minimum row index is 1. Since Cells(1,1) = [A1].

Also, since you want the info after "ID:", 1st array element from Split function isn't needed for scraping purpose (which is part that's before 1st "ID:"). Therefore, loop should start from 2nd array element (i.e. 1).

You can do something like below to observe which element contains what string.

FYI - It's bad idea to use "str" as variable. Since it's used in VBA function.

Code:
Sub Demo()
mystr$ = "Blah, Blah, Blah, ID: 1, Name:NameA, ID: 2, Name:NameB, ID: 3, Name:NameC"

V = Split(mystr, "ID:")

For i% = LBound(V) To UBound(V)
Debug.Print "Array Element " & i & "; is "; V(i)
Next
End Sub
 

shahin

Active Member
Thanks sir Chihiro, for you answer. It's an epic demo indeed. It helped me a lot. Thanks a trillion.
 

shahin

Active Member
@Marc L,
Here in post #16 I just came across the line you mentioned (a legacy from BASIC grandfather). It's true but the way you deliver I got it very humorous. How could I fail to see!!! Thanks.
 

Marc L

Excel Ninja
Add Microsoft Script Control reference to try this second demonstration :​
Code:
Sub Demo2()
    Const A = "angular.callbacks._0"
    Dim R&, S$, oC As Object, oM As Object
    ActiveSheet.UsedRange.Clear
    [A1:G1].Value = [{"Id","FullName","Phone","Fax","Email","WebSite","LocationId"}]
       [B2].Value = "  Download in progress …"
                R = 1
With New XMLHTTP60
    .Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?callback=" & A, False
    .send
     S = Replace(.responseText, A, "")
End With
With New ScriptControl
                   .Language = "JScript"
    For Each oC In .Eval(S)
            R = R + 1
            Cells(R, 1).Resize(, 6).Font.Bold = True
            Cells(R, 1).Resize(, 6).Value = Array(CallByName(oC, "Id", VbGet), oC.Company, oC.WorkPhone, oC.Fax, oC.EmailAddress, oC.Website)
        For Each oM In oC.Members
            R = R + 1
            Cells(R, 1).Resize(, 7).Value = Array(CallByName(oM, "Id", VbGet), oM.FullName, oM.Phone, oM.Fax, oM.Email, _
                                                  CallByName(oM, "WebSite", VbGet), oM.LocationId)
        Next
    Next
End With
    ActiveSheet.UsedRange.Columns.AutoFit
End Sub
Companies rows are bold within worksheet …

Replace function in codeline #11 just to fit JScript Eval in codeline #15.

CallByName function as JScript fields are case sensitive :
if oC.Id is written in codeline #18 it will be changed to oC.ID
as ID already exists in VBA …

Tip to avoid this function : add Id$ in Dim codeline !
So you may use this variable (for example in place of S$)
or just remove Id$ from Dim codeline : Id stays as the new spelling !

Special case here for Company.Website and Member.WebSite :
so the last spelling written in code is the one that VBE keeps !
So CallByName function must be used for one spelling or
replace in source text all "WebSite" by "Website" …
 

shahin

Active Member
The best and fastest method by far. I've got to study on it seriously. Thanks a lot, Marc L.
 
Status
Not open for further replies.
Top