• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Can't parse two fields from all the containers out of some json response in the right way

shahin

Active Member
I'm trying to fetch two fields from each container from some json response using regex. When I execute the script that I've written so far can produce the two fields from all the containers. However, the way I've defined the last loop doesn't seem to be an ideal one. To be clearer, I used the count of name and created a loop to parse the required fields. If the count of `names` and `changeAmount` are different the results will be real messy. How can I rectify the loop to scrape the two fields in the right way?

I've tried with (working one):

Code:
Sub FetchContent()
    Const Url$ = "https://api-global.morningstar.com/sal-service/v1/stock/ownership/v1/0P000000GY/OwnershipData/mutualfund/20/data?locale=en&clientId=MDC&benchmarkId=category&version=3.21.1"
    Dim elem As Object, oelem As Object, I&, R&, S$
    Dim Http As Object, Rgxp As Object, wb As Workbook, ws As Worksheet
 
    Set wb = ThisWorkbook
    Set ws = wb.Worksheets("Sheet1")
    Set Http = CreateObject("MSXML2.XMLHTTP")
    Set Rgxp = CreateObject("VBScript.RegExp")

    With Http
        .Open "GET", Url, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
        .setRequestHeader "ApiKey", "lstzFDEOhfFNMLikKa0am9mgEKLBl49T"
        .send
        S = .responseText
    End With
 
    With Rgxp
        .Global = True
        .MultiLine = True
     
        .Pattern = "name"":""(.*?)"""
        Set elem = .Execute(S)
     
        .Pattern = "changeAmount"":(.*?),"
        Set oelem = .Execute(S)
    End With
 
    For I = 0 To elem.Count - 1
        R = R + 1: ws.Cells(R, 1) = elem(I).SubMatches(0)
        ws.Cells(R, 2) = oelem(I).SubMatches(0)
    Next I
End Sub
The following are the type of containers I'm scraping those two fields from:
Code:
0: {secId: "FOUSA00FQU", name: "Vanguard Total Stock Mkt Idx Inv", totalSharesHeld: 2.564925871507663,…}
changeAmount: -1331374
changePercentage: -1.1355487246359206
currentShares: 115913617
date: "2020-04-30T00:00:00.000"
name: "Vanguard Total Stock Mkt Idx Inv"
secId: "FOUSA00FQU"
starRating: "4"
totalAssets: 4.16033
totalSharesHeld: 2.564925871507663
trend: "_PO_"

1: {secId: "FOUSA00FS1", name: "Vanguard 500 Index Investor", totalSharesHeld: 1.8912105957275436,…}
changeAmount: -487891
changePercentage: -0.5676114490562759
currentShares: 85467211
date: "2020-04-30T00:00:00.000"
name: "Vanguard 500 Index Investor"
secId: "FOUSA00FS1"
starRating: "4"
totalAssets: 5.08629
totalSharesHeld: 1.8912105957275436
trend: "_PO_"
PS I would like to stick to the way I've already tried and I'm not after any solution related to any json converter.

I've posted the same problem in here https://stackoverflow.com/questions/61907873/unable-to-parse-two-fields-from-all-the-containers-out-of-some-json-response-in as well.
 
Last edited:

shrivallabha

Excel Ninja
One way might be to capture the blocks first using a pattern like
Code:
^\d+:[\s\S]+?trend:
and then extracting the relevant values if they exist. Not elegant (just like my RegExp knowledge ;) )
 

shahin

Active Member
This is the very way I'm trying to achieve the results but I'm the last person who can claim to have any good knowledge on regex. However, I would mange if I could know how to do the iteration in such cases. Thanks @shrivallabha, it's always a pleasure.
 

shrivallabha

Excel Ninja
If you mean the internal loop then one way could be (with posted sample at least):
Code:
    With Rgxp
        .Global = True
        .MultiLine = True
        
        '\\ Load Blocks Here
        .Pattern = "^\d+:[\s\S]+?trend:"
        Set elem = .Execute(S)
        '\\ Loop through them
        r = 2
        For i = 0 To elem.Count - 1
            .Pattern = "^name:(|\s+)""(.+?)""" '\\ Pick items from the block
            Set subelem = .Execute(elem(i).Value): If subelem.Count > 0 Then Cells(r, 1).Value = subelem(0).SubMatches(1)
            .Pattern = "^changeAmount:(|\s+)(.+)" 'Pick second item from the block
            Set subelem = .Execute(elem(i).Value): If subelem.Count > 0 Then Cells(r, 2).Value = subelem(0).SubMatches(1)
            r = r + 1
        Next i
    End With
 

shahin

Active Member
It looks ideal. This is the logic I probably should stick with. However, the only problem is that `elem.count` is `0` even when I get the data ridden response when I print `S`.
 

shrivallabha

Excel Ninja
It looks ideal. This is the logic I probably should stick with. However, the only problem is that `elem.count` is `0` even when I get the data ridden response when I print `S`.
I tested with the sample of data you posted. If it is not picking up then you should check the text that is getting processed.
 

shahin

Active Member
Okay, I'm pasting below some valid json for your consideration. As I copied the earlier sample directly from dev tools, they are botched up in syntax @shrivallabha. Thanks.

Rectified sample:

Code:
{
    "secId": "FOUSA00FQU",
    "name": "Vanguard Total Stock Mkt Idx Inv",
    "totalSharesHeld": 2.564925871507663,
    "totalAssets": 4.16033,
    "currentShares": 115913617,
    "changeAmount": -1331374,
    "changePercentage": -1.1355487246359206,
    "date": "2020-04-30T00:00:00.000",
    "trend": "_PO_",
    "starRating": "4"
},
{
    "secId": "FOUSA00FS1",
    "name": "Vanguard 500 Index Investor",
    "totalSharesHeld": 1.8912105957275436,
    "totalAssets": 5.08629,
    "currentShares": 85467211,
    "changeAmount": -487891,
    "changePercentage": -0.5676114490562759,
    "date": "2020-04-30T00:00:00.000",
    "trend": "_PO_",
    "starRating": "4"
},
{
    "secId": "FEUSA00001",
    "name": "SPDR\u00ae S&P 500 ETF Trust",
    "totalSharesHeld": 0.994538610986949,
    "totalAssets": 5.07929,
    "currentShares": 44944990,
    "changeAmount": -436740,
    "changePercentage": -0.9623696584506585,
    "date": "2020-04-30T00:00:00.000",
    "trend": "_PO_",
    "starRating": "5"
}
 

shahin

Active Member
It seems I've been able to make the following script work coping with the logic provided by @shrivallabha.

Code:
Sub FetchContent()
    Const Url$ = "https://api-global.morningstar.com/sal-service/v1/stock/ownership/v1/0P000000GY/OwnershipData/mutualfund/20/data?locale=en&clientId=MDC&benchmarkId=category&version=3.21.1"
    Dim elem As Object, oelem As Object, I&, R&, S$
    Dim Http As Object, Rgxp As Object, subElem As Object, subElemAno As Object
    
    Set Http = CreateObject("MSXML2.XMLHTTP")
    Set Rgxp = CreateObject("VBScript.RegExp")

    With Http
        .Open "GET", Url, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
        .setRequestHeader "ApiKey", "lstzFDEOhfFNMLikKa0am9mgEKLBl49T"
        .send
        S = .responseText
    End With

    With Rgxp
        .Global = True
        .MultiLine = True

        .Pattern = "secId[\s\S]+?starRating"
        Set elem = .Execute(S)

        For I = 0 To elem.Count - 1
            .Pattern = "name"":""(.*?)"""
            Set subElem = .Execute(elem(I).Value)
            If subElem.Count > 0 Then
                R = R + 1: Cells(R, 1) = subElem(0).SubMatches(0)
            End If
            
            .Pattern = "changeAmount"":(.*?),"
            Set subElemAno = .Execute(elem(I).Value)
            If subElemAno.Count > 0 Then
                Cells(R, 2) = subElemAno(0).SubMatches(0)
            End If
        Next I
    End With
End Sub
Btw, It's merely a demonstration as to how we can, so ignore the pattern (the worst ones) that I've used.
 

shrivallabha

Excel Ninja
That caret ^ symbol would fail with the new sample. JSON chunk seems to be the block wrapped in braces ({}) and therefore the block pattern could be:
Code:
^{[\s\S]+?^}
and then you can safely execute rest of the code without worrying if other patterns exist or not.
 
Top