• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract and sort data from a webpage

NicolasDP

New Member
Hello everyone,

I'm looking to extract and sort a large amount of data from a webpage.
This is a kind of release calendar that I would like to have as clear as possible by putting all the data in the following columns:

ID - NAME - COLOR - DATE - LINK

The feed is located here: https://www.footlocker.be//INTERSHO...ndar/launchdata/launchcalendar_feed_all1.json

The syntax stays always the same for each release, for example ID can be found for each release after "id":", DATE can be found each time after "releaseDateTime":, etc..

Thanks in advance,
Nicolas.
 

Hi !

You forgot to post an expected result workbook !
But for json data, there are already ways on web, just do a research …

See also Split function in VBA inner help.​
 
Last edited:

Hey Nico ! Any news, where have you been ?

Without any sample, you can start yourself to download on local drive the
json text file, open it with Excel or with VBA (via Open For Input for example), then you can parse data via Split function as any text file,
beginner level …

Of course with more information, I could directly extract data from Web
with my old procedure according to your need …
 
Hello Marc!
Thanks for the reply, I made some research of how to convert a json to excel and found this amazing link: http://konklone.io/json/

So it just converts the raw data in columns, then I just download the csv and import it in my final excel file.

I need to copy/paste the raw date into my excel and then I used the excel function "convert" to make colums after each comma.

I have the final result that I wanted so it's ok, I would like to now make it easier by directly importing the data into the Excel so I just need to click refresh to actualize it.

Thanks for the help :)
 

Attachments

  • FL-HOH Calendar.xlsx
    517.2 KB · Views: 3

Do you really need each country link ? Surprised to see Link BE
'cause there is no BE link in text file …

In fact do you really need to extract all fields or some (so which ones) ?

Date + Time or just date ?

No sort as per thread title? Just a filter ?
 
I just need the BE link actually because I'm in Belgium, I made the BE link based on the UK one just with replacement of .co.uk into .be that's it.

For the rest it should just be like it's in the file now, time is not needed but I took it as it's in the raw data.

Finally I used filters to see the upcoming releases so I just filtered with the most recent dates in the top.
 
I have two ways :

first is a hard code way, in case of an update (new column or order change), you have to amend it …

Second is my old rough text extract procedure working with columns
headers (exact text file fields like name for Model in your sample) : in case
of an update, no need to amend extract procedure, maybe just main procedure after data import for a column format or a special need like for the BE link …

Which one ?
 
I would prefer the second option, like that I can adapt it myself if needed, don't feel comfortable with the hard code :)
 
Good choice !

So my jsonWebExtract function à la romaine
(means like in Roma but at the time of Caesar ! Guess in code …)
needs two arguments :

• a json text file URL;

• a two rows array for fields to extract.
First row is for fields names (must be same as in the json file),
a field, a column … Key field must stay in first (id here).
Second row purpose is to extract a specific index of a field
with multiple values like link for example.
UK link is index #6 from your file, so if links are displayed in column D,
array(1, 4) must be equal to "link" and array(2, 4) to 6
But second row could be left blank if in field name there is an index.
The name must be followed by a space and a # before the index number
like for example "link #6" and could be followed by any text.
For other items left blank, default index is set by code to 1.

It's this last way choosen in the following demonstration just by setting
headers within a fresh worksheet of a brand new workbook !​

Output1.jpg

If during execution error #9 occurs, there is a bad header
not respecting field names of source json text file !

The demonstration procedure calls the function returning result
in an array. Just has to prepare columns display and to amend data
for special need like link be from uk, and performs sort and filter …

Bonus : after data import, a double click on a header will sort the column.
Another one on same header will sort but in reverse !

Extra bonus : a double click on a link try to open it !
If it's a bad link, it becomes grey …

Code to paste only in the destination worksheet module :​
Code:
Function jsonWebExtract(URL$, HV)
                    Dim SPQ$()
With CreateObject("MSXML2.XMLHttp")
    .Open "GET", URL, False
    .SetRequestHeader "DNT", "1"
    On Error Resume Next
    .Send
    On Error GoTo 0
    If .Status <> 200 Then Beep: Exit Function
    SPQ = Split(Replace$(.responseText, "\""", ChrW$(&H201C)), "{""" & HV(1, 1) & """:""")
End With

ReDim VT(1 To UBound(SPQ), 1 To UBound(HV, 2))

For C& = 2 To UBound(HV, 2)
     V = Split(HV(1, C), " #"):  HV(1, C) = ",""" & V(0) & """:"""
               If UBound(V) Then HV(2, C) = Val(V(1))
            If HV(2, C) = 0 Then HV(2, C) = 1
Next

For R& = 1 To UBound(SPQ)
        VT(R, 1) = Split(SPQ(R), """")(0)
    For C = 2 To UBound(HV, 2)
        VT(R, C) = Split(Split(SPQ(R), HV(1, C))(HV(2, C)), """")(0)
    Next
Next
         jsonWebExtract = VT
End Function


Sub Demo()
Const URL = "https://www.footlocker.be//INTERSHOP/static/WFS/Footlocker-Site/-/Footlocker/en_US/Release-Calendar/launchdata/launchcalendar_feed_all1.json"
Application.ScreenUpdating = False
If Me.FilterMode Then Me.ShowAllData
Me.UsedRange.Offset(1).Clear
VA = jsonWebExtract(URL, Cells(1).CurrentRegion.Rows("1:2").Value)
         If Not IsArray(VA) Then Beep: Exit Sub
With [A2].Resize(UBound(VA), UBound(VA, 2)).Columns
         .Item(1).NumberFormat = "#"
         .Item(3).NumberFormat = "@"
    With .Item(5):  .HorizontalAlignment = xlRight:  .IndentLevel = 2:  .NumberFormat = "m/d/yyyy":  End With
         .Value = VA
         .Item(6).Replace ".co.uk", ".be", xlPart
End With
With Cells(1).CurrentRegion.Columns
    .Item(2).IndentLevel = 1
    .Sort .Cells(5), 2, Header:=xlYes
    .Item(5).AutoFilter 1, ">" & CLng(DateSerial(Year(Date), Month(Date), 1))
    .Item("A:E").AutoFit
End With
Application.ScreenUpdating = True
End Sub


Private Sub Worksheet_BeforeDoubleClick(ByVal Target As Range, Cancel As Boolean)
                                       Static COL&
If Target.Value Like "https://*" Then
    Cancel = True
    On Error Resume Next
    Application.DisplayAlerts = False
    ThisWorkbook.FollowHyperlink Target.Value
    Target.Font.ColorIndex = IIf(Err.Number, 48, 0)
    Application.DisplayAlerts = True
Else
    With Cells(1).CurrentRegion
        If Intersect(.Rows(1), Target) Is Nothing Then Exit Sub
        Cancel = True:     COL = -Target.Column * (Target.Column <> COL)
        .Sort Target, 2 + (COL > 0), Header:=xlYes
    End With
End If
End Sub
Do you like it ? So thanks to click on bottom right Like !

Split & Split : Chrüterchraft ‼​
 
Last edited:
Hey Nico !

This way à la romaine (even guess why in code ?) was just to show
there is always at least a solution to convert a text file
with some VBA text functions (InStr, Mid, Split, …)
as Excel can load any text file even via an URL !

But there is a modern way which can show json file structure
and helps to quickly code a file import or export,
even in a hard code way, wanna see ?
 
Hey Marc,

No I didn't understood why à la romaine, can't wait that you tell me lol!
Of course I'm willing to learn and see what's possible :)
Thanks again!
 
« Senatus PopulusQue Romanus » : SPQR !

JSON is based from a subset of the JavaScript coding language,
meaning JavaScript Object Notation

VBA has any function to work with JSON ? Nope …
But VBA can reach under Windows the JavaScript Engine !
So within this first part, let's start with a tool
to quick scan a json data structure !

Paste this code to a fresh new worksheet module :
Code:
Const FDL = " &", OBJ = "[[]object *", _
      HDR = "id,brand,name,colors,releaseDatetime,deepLinks &5 &link,hasStock,image,hasImage", _
      SRC = "https://www.footlocker.be//INTERSHOP/static/WFS/Footlocker-Site/-/Footlocker/en_US/Release-Calendar/launchdata/launchcalendar_feed_all1.json"
  Dim JSc As Object, JSL%, JSR&

Function TextRequest$(URL$)
With CreateObject("MSXML2.XMLHttp")
    .Open "GET", URL, False
    .setRequestHeader "DNT", "1"
    On Error Resume Next
    .send
    On Error GoTo 0
    If .Status = 200 Then TextRequest = .responsetext Else Beep
End With
End Function

Function jsonEval(jsonTXT$) As Object
If JSc Is Nothing Then
Set JSc = CreateObject("ScriptControl")
    JSc.Language = "JScript"
    JSc.AddCode "function getKeys(jsonObj) { var keys = []; for (var i in jsonObj) { keys.push(i); } return keys; }"
End If
              JSL = -1:            JSR = 1
     Set jsonEval = JSc.Eval("(" & jsonTXT & ")")
End Function

Sub jsonLightR(oRoot As Object, DATA, Optional KEY$, Optional L&)
           Dim oKeys As Object
           Set oKeys = JSc.Run("getKeys", oRoot)
If L > JSL Then
    If Split(oKeys, ",")(0) <> "0" Then Debug.Print vbLf & "Level #" & L & " fields : " & KEY & oKeys
    JSL = L
End If
For Each C In oKeys
       D$ = CallByName(oRoot, C, VbGet)
    If D Like OBJ Then
        If C <> "0" Or KEY > "" Then K$ = KEY & C & FDL
        jsonLightR CallByName(oRoot, C, VbGet), DATA, K, L + 1
        If K = "" Then Exit For
    Else
        DATA(0) = DATA(0) & KEY & C & vbTab
        DATA(1) = DATA(1) & D & vbTab
    End If
Next
      Set oKeys = Nothing
End Sub

Sub jsonLightScan()
    Dim VA$(1)
    T$ = TextRequest(SRC):  If T = "" Then Exit Sub
    jsonLightR jsonEval(T), VA
    Set JSc = Nothing
    If AutoFilterMode Then Cells(1).AutoFilter
    Cells(1).ClearOutline:  Me.UsedRange.Offset(1).Clear
    [A2].Value = VA(0):    [A3].Value = VA(1)
    [A2:A3].TextToColumns , xlDelimited, , , True
End Sub
You may Like it !

Stay in VBE, open the Immediate window (CTRL + G)
then execute jsonLightScan procedure and see the result :
Output1.jpg

JSON root structure is level #0. So here fields start at level #1,
root is for rows of this "columns" data file … The same for level #2,
there are some rows subset from level #1 as revealed in level #3 :
deepLinks is a level #1 field, &0 is first row of sub level (#2)
and contains two fields in level #3 : locale and link

But how to find in which row is the english link ?
Just see the result in worksheet ‼
Fields start in row #2 and first data a row under :
easy to locate the en in M3 cell, link next and field name above in N2 cell :
deepLinks &5 &link ! Meaning property link in 6th row
(first starting at index 0 …) of deepLinks field …
Easy now to paste necessary fields in row #1 for further part !
Output2.jpg

So this tiny process performs a scan of json structure and just
arrange first data row in columns as a quick snap

Could be a bit less useful with a "rows x columns" file like this one :
Output3.jpg

Two keys at root (ACM & DEF) with five subkeys each (X1 to X5)
and each couple of keys and subkeys with 14 columns (fields BU to UT) :
so this procedure uses 2 x 5 x 14 = 140 columns !
These data structure only needs 16 columns (2 keys + 14 fields) …

As you will see with the tool in next part !
 

Amend declaration of procedure jsonLightR :

Sub jsonLightR(oRoot As Object, DATA, Optional KEY$, Optional L%)
 
Back
Top