1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How can I parse phone numbers using string manipulation from a webpage?

Discussion in 'VBA Macros' started by shahin, Jun 6, 2017.

  1. shahin

    shahin Active Member

    Messages:
    428
    As I was told that If i manipulate string on the javascript encrypted html stuff, then i can parse certain things i would like to have. This site i'm dealing with now contains javascript wrapped html elements. I would like to parse the phone numbers from that page as it is easy to prepare the ground comparatively. I didn't use query string as I noticed that without using it the desired page can be located with the url. I was trying like this:

    Btw, FYC the search city is: Saskatoon

    Code (vb):

    Sub test()
        Dim http As New XMLHTTP60
        Dim str As Variant
     
    With http
        .Open "GET", "https://www.saskatoonrealtors.ca/web/SRAR/Find_A_REALTOR/SRAR/Find_an_Agent/Find_A_REALTOR_2.aspx?hkey=96d9da44-1d30-4c12-8c70-b3c495fe0610", False
        .setRequestHeader "Content-Type", "text/html; charset=utf-8"
        .send
    End With

       On Error Resume Next
       str = Split(http.responseText, "<td class=""officeCell"">")
       y = UBound(str)
           For i = 1 To y
               Cells(x, 1) = Split(Split(str(i), "<div class=""ng-binding"">Phone:")(1), "</")(0)
           Next i
    End Sub
     
    The visible elements within which phone numbers are stored:
    Code (vb):

    <td class="officeCell">
                        <h4 ng-click="selectItem(element)" class="ng-binding">2 Percent Realty Inc.</h4>
                        <div ng-bind-html="element.FullAddress | newlines" class="ng-binding">700 2010 11th Ave Regina, SK  S4P 0J3 CANADA</div>
                        <div class="ng-binding">Phone: (639) 739-0909</div>
                        <div ng-show="element.Fax" class="ng-binding">Fax: (888) 123-4567</div>
                        <div ng-show="element.BrokerName" class="ng-binding">Broker: Morris Tkachuk</div>
                        <div ng-show="element.EmailAddress"><a ng-href="mailto:info@2percentrealty.ca" class="ng-binding" href="mailto:info@2percentrealty.ca">info@2percentrealty.ca</a></div>
                        <a ng-show="element.Website" ng-href="http://" target="_blank" href="http://" class="ng-hide">Website</a>
                    </td>
     
    Last edited: Jun 6, 2017
  2. Hui

    Hui Excel Ninja Staff Member

    Messages:
    10,604
    Paul Kelly from Macro Mastery is doing a live Youtube Demo of a technique to achieve this tonight.
    You can watch it for free at:

    I'd suggest logging on and ask him questions as you go or send him some data/link prior to the presentation so he can consider it.
    Email Paul at: paul@excelmacromastery.com
    ThrottleWorks and shahin like this.
  3. Chihiro

    Chihiro Well-Known Member

    Messages:
    3,577
    @shahin

    You should output responseText to text file and inspect contents. You'll notice the information you are looking for isn't contained there.

    You need following:
    Code (vb):
    https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0&city=&companyName=&personName=
    This will return response in format of JSON. You just need to parse it using string manipulation or other methods.
    ThrottleWorks and shahin like this.
  4. shahin

    shahin Active Member

    Messages:
    428
    @sir Chihiro,
    Your every single bit of information on anything never fails to amaze me. I wonder, how did you get that url or what is the actual process to get the target url. Every time i work with a url it seemed to me so long that i was right but it turned out i was very wrong.
    ThrottleWorks likes this.
  5. Chihiro

    Chihiro Well-Known Member

    Messages:
    3,577
    First launch Developer Tool.

    Then check XHR (i.e. XMLHttpRequest) section, the site returns nothing there. So, you can surmise that info is likely from script.

    Now, check "JS" section.

    You can skip over anything that does not contain "?" in Name section. Since without it, query can't be made.

    If you hover over each name, you can eliminate any that starts with Google API url. Since those are all used to fill the map.

    When you skim through Name list, you will notice one that starts with "?" and also contains "address=...". This is likely candidate as it has parameters similar to search conditions on the site.

    When you check the "Preview" of that entry. You can see there are 193 elements returned (0-192). This matches results returned by search on the site. Now just Ctrl-F in "Response" section for some name or phone# to verify.
    ThrottleWorks and shahin like this.
  6. shahin

    shahin Active Member

    Messages:
    428
    Thanks sir for the clarity. Ctrl+F is such a keyword I have been looking for since I started to learn the web element inspection stuff. Thanks a ton. One more thing sir: is it at all possible to scrape json data using string manipulation?
    ThrottleWorks likes this.
  7. Chihiro

    Chihiro Well-Known Member

    Messages:
    3,577
    Yes. Json is like any other text/string. Just a bit more structured.
    Ex.
    ":" denotes Category:Data pair. Each pair is delimited by ",".

    You will need to study and test with actual response to build logic to extract information that you want.

    Personally, I'd leverage VBScript.RegExp for this one. Using pattern to extract matches. You can read about using RegEx in VBA in link below.

    http://analystcave.com/excel-regex-tutorial/
    ThrottleWorks and shahin like this.
  8. shahin

    shahin Active Member

    Messages:
    428
    Thanks sir. I'll roger that. Now, I'm gonna try and see the result if I manipulate string on json response.
    ThrottleWorks likes this.
  9. shahin

    shahin Active Member

    Messages:
    428
    Finally, I have been able to do it. Very close to what I wanted. Here is what I've tried with:
    Code (vb):

    Sub Api_stuff()
    Dim http As New XMLHTTP60, str As Variant
        With http
            .Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0&city=&companyName=&personName=", False
            .send
            str = Split(.responseText, "{""Id"":")
        End With
        x = UBound(str)
        On Error Resume Next
        For v = 1 To x
            Cells(v, 1) = Split(Split(str(v), "FullName"":")(1), ",""FirstName")(0)
            Cells(v, 2) = Split(Split(str(v), "Phone"":")(1), ",""Fax")(0)
        Next v
    End Sub
     
    Image of the partial result:

    Attached Files:

    ThrottleWorks likes this.
  10. Marc L

    Marc L Excel Ninja

    Messages:
    3,141
    Hi !

    As again it is nothing but text just needing an easy logic !
    For example see this demonstration :​
    Code (vb):
    Sub Demo1()
            Dim V, SP$(), N$, P$, R&
        With New XMLHTTP60
            .Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0", False
            .send
            For Each V In Split(.responseText, "{""Id"":")
                          SP = Split(V, "FullName"":""")
                If UBound(SP) Then
                       N = Split(SP(1), """")(0)
                    If N > "" Then
                                  SP = Split(V, """Phone"":""")
                        If UBound(SP) Then
                               P = Split(SP(1), """")(0)
                            If P > "" Then R = R + 1: Cells(R, 1).Resize(, 2).Value = Array(N, P)
                        End If
                    End If
                End If
            Next
        End With
    End Sub
    Do you like it ? So thanks to click on bottom right Like !
    ThrottleWorks and shahin like this.
  11. shahin

    shahin Active Member

    Messages:
    428
    Thanks Marc L, for your robust and perfect solution. This is undoubtedly the best demo using string manipulation on json data I've ever come across. You are really the gem of a guy. So far, this type of stuff (I meant, json response) was a great barrier for me to overcome while scraping web data. Now, I can move one step ahead. I still find it difficult to understand thoroughly your code. I wish I could code like you. Thanks a ton. Btw, I heavily liked your answer.:)
    ThrottleWorks likes this.
  12. Marc L

    Marc L Excel Ninja

    Messages:
    3,141
    Split VBA inner function is the way to extract partial text
    from a global text at "child level" …

    For a JSon structure you can also use an object like in this thread
    (see also post #47 using RegExp)
    but whatever the way you must analyse text structure,
    what is before and after text to extract, …
    shahin and ThrottleWorks like this.
  13. shahin

    shahin Active Member

    Messages:
    428
    @Marc L,
    I was not talking about split function cause I can handle it now rather I was talking about your unconventional way of coding which brings a lot more prettier appearance, moreover you like to declare variable on the fly. Thanks again for the link you have provided above.
  14. Marc L

    Marc L Excel Ninja

    Messages:
    3,141
    My coding way just tries to respect object model and Logic
    and here in post #10 variables are not declared on the fly !

    The more analyzing before, the more easier is coding …
    ThrottleWorks and shahin like this.
  15. shahin

    shahin Active Member

    Messages:
    428
    That's right what you said here. My bad! But, In few instances I saw you write variables like [SP$(), N$, P$, R&] just before applying them. Merci pour tout.
  16. Marc L

    Marc L Excel Ninja

    Messages:
    3,141

    It's like As String or As Long as you can read in VBA inner help.
    (a legacy from BASIC grandfather !)

    As during execution you can check variables types in Locals window …
    shahin likes this.
  17. stefanoste78

    stefanoste78 Member

    Messages:
    69
    You could enter a file with the inserted form that extracts the url phone number placed on a column (as you have just done last time).
    thank you
  18. shahin

    shahin Active Member

    Messages:
    428
    And the complete parser should look like this if I ignore "On Error Resume Next":
    Code (vb):

    Sub grabbing_jsondata()
    Dim http As New XMLHTTP60, str As Variant
    Range("A1:E1") = Array("Name", "Phone Number", "Fax", "Email Address", "Web Address")
    With http
        .Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0&city=&companyName=&personName=", False
        .send
        str = Split(.responseText, "{""Id"":")
    End With
    x = UBound(str)
    On Error Resume Next
    For V = 1 To x
        Cells(V, 1) = Split(Split(str(V), "FullName"":""")(1), """")(0)
        Cells(V, 2) = Split(Split(str(V), "Phone"":""")(1), """")(0)
        Cells(V, 3) = Split(Split(str(V), "Fax"":""")(1), """")(0)
        Cells(V, 4) = Split(Split(str(V), "Email"":""")(1), """")(0)
        Cells(V, 5) = Split(Split(str(V), "WebSite"":""")(1), """")(0)
    Next V
    Columns("A:E").AutoFit
    Set http = Nothing: Set str = Nothing
    End Sub
     
  19. Marc L

    Marc L Excel Ninja

    Messages:
    3,141
    As Variant str variable is not an object so no needs to set it to Nothing

    As the Json data has multi levels (companies, members of companies)
    you could had some columns to avoid "blank" rows or phone # alone or
    separating data in two sheets : one for companies and the other for members.
    shahin likes this.
  20. shahin

    shahin Active Member

    Messages:
    428
    @Marc L,
    One last question on this: In several cases in your script or any script of someone else I noticed that the line always goes like the below code in case of scraping web data, I meant shouldn't it be ideal to use the line like [For v = 0 to x - 1]? I'm little weak in playing with arrays. Using the expression I mentioned here, I found that It misses the lastmost value to scrape. I knew that counting of an array always starts from 0 then what is wrong here? Thanks in advance.
    Code (vb):

    x = UBound(str)
    OnErrorResumeNext
    For V = 1 To x
        Cells(V, 1) = Split(Split(str(V), "FullName"":""")(1), """")(0)
    Next V
     
    Last edited: Jun 9, 2017
  21. Chihiro

    Chihiro Well-Known Member

    Messages:
    3,577
    V represents row index in this case.
    Minimum row index is 1. Since Cells(1,1) = [A1].

    Also, since you want the info after "ID:", 1st array element from Split function isn't needed for scraping purpose (which is part that's before 1st "ID:"). Therefore, loop should start from 2nd array element (i.e. 1).

    You can do something like below to observe which element contains what string.

    FYI - It's bad idea to use "str" as variable. Since it's used in VBA function.

    Code (vb):
    Sub Demo()
    mystr$ = "Blah, Blah, Blah, ID: 1, Name:NameA, ID: 2, Name:NameB, ID: 3, Name:NameC"

    V = Split(mystr, "ID:")

    For i% = LBound(V) To UBound(V)
    Debug.Print "Array Element " & i & "; is "; V(i)
    Next
    End Sub
    Marc L and shahin like this.
  22. shahin

    shahin Active Member

    Messages:
    428
    Thanks sir Chihiro, for you answer. It's an epic demo indeed. It helped me a lot. Thanks a trillion.
  23. shahin

    shahin Active Member

    Messages:
    428
    @Marc L,
    Here in post #16 I just came across the line you mentioned (a legacy from BASIC grandfather). It's true but the way you deliver I got it very humorous. How could I fail to see!!! Thanks.
  24. Marc L

    Marc L Excel Ninja

    Messages:
    3,141
    Add Microsoft Script Control reference to try this second demonstration :​
    Code (vb):
    Sub Demo2()
        Const A = "angular.callbacks._0"
        Dim R&, S$, oC As Object, oM As Object
        ActiveSheet.UsedRange.Clear
        [A1:G1].Value = [{"Id","FullName","Phone","Fax","Email","WebSite","LocationId"}]
           [B2].Value = "  Download in progress …"
                    R = 1
    With New XMLHTTP60
        .Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?callback=" & A, False
        .send
         S = Replace(.responseText, A, "")
    End With
    With New ScriptControl
                       .Language = "JScript"
        For Each oC In .Eval(S)
                R = R + 1
                Cells(R, 1).Resize(, 6).Font.Bold = True
                Cells(R, 1).Resize(, 6).Value = Array(CallByName(oC, "Id", VbGet), oC.Company, oC.WorkPhone, oC.Fax, oC.EmailAddress, oC.Website)
            For Each oM In oC.Members
                R = R + 1
                Cells(R, 1).Resize(, 7).Value = Array(CallByName(oM, "Id", VbGet), oM.FullName, oM.Phone, oM.Fax, oM.Email, _
                                                      CallByName(oM, "WebSite", VbGet), oM.LocationId)
            Next
        Next
    End With
        ActiveSheet.UsedRange.Columns.AutoFit
    End Sub
    Companies rows are bold within worksheet …

    Replace function in codeline #11 just to fit JScript Eval in codeline #15.

    CallByName function as JScript fields are case sensitive :
    if oC.Id is written in codeline #18 it will be changed to oC.ID
    as ID already exists in VBA …

    Tip to avoid this function : add Id$ in Dim codeline !
    So you may use this variable (for example in place of S$)
    or just remove Id$ from Dim codeline : Id stays as the new spelling !

    Special case here for Company.Website and Member.WebSite :
    so the last spelling written in code is the one that VBE keeps !
    So CallByName function must be used for one spelling or
    replace in source text all "WebSite" by "Website" …
    shahin likes this.
  25. shahin

    shahin Active Member

    Messages:
    428
    The best and fastest method by far. I've got to study on it seriously. Thanks a lot, Marc L.

Share This Page