• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract PDF links from a link

YasserKhalil

Well-Known Member
Hello everyone

I have this link
LINK

There are three topics on this page and each is displayed by the collapse method and below each topic, there are sub-topics (months in Hijri calendar).
Below each sub-topic, there are PDF links.
How can I extract those links in organized columns: Column A would be for topics and Column B for sub-topics and Column C for the PDF links?

Thanks in advance for help.
 
And here's snapshots of the explained steps : topics and sub-topics and pdf links
 

Attachments

  • 001.png
    001.png
    2.5 KB · Views: 5
  • 002.png
    002.png
    14.7 KB · Views: 8
  • 003.png
    003.png
    16.4 KB · Views: 6
Hmm, this site has pretty bad structure for HTML.

And just having 1 sample isn't enough to decipher it's structure.

Here's sample to get you started.

Code:
Sub Demo()
Dim intFF As Integer: intFF = FreeFile()
Dim iFile As String: iFile = "C:\Test\HTML.txt"
Dim html As HTMLDocument
Open iFile For Input As #intFF
strContent = Input(LOF(intFF), intFF)
Close #intFF

Set html = New HTMLDocument

With html
    .body.innerHTML = strContent
    Set x = .getElementsByTagName("a")
    For i = 0 To x.Length - 1
        If x(i).getAttribute("data-parent") Like "*accordion1" Then
            Debug.Print x(i).innerText
        End If
        If x(i).getAttribute("data-parent") Like "*accordion2" Then
            Debug.Print x(i).innerText
        End If
    Next
End With
End Sub

This will get the first category and it's sub-categories.

Unfortunately I can't test further as rest of fields require filter/query based on non-ascii based characters.

Alternately, you can use div's id, but that requires some hard coding and/or conversion process as id is based on # of sub category.
Ex: links of first Category, first sub-category are contained in <div id="collapseThreeOne"...>, links of 2nd category, first sub-category are in <div id="collapseThreeOne100" ...> it looks like.

So you'll have to study the HTMLDOM structure and traverse over it.
 
Not sure. It's probably easier to construct string based on what levels are present in the page.

Top Category: 1st item - you'd use...
Code:
If x(i).getAttribute("data-parent") Like "*accordion1" Then

Top Category: 2nd item - you'd use...
Code:
If x(i).getAttribute("data-parent") Like "*accordion100" Then

For 3rd item, use "*accordion1000"

Same pattern for Sub Categories. Using 2 instead of 1 in the string.

Then as I mentioned, you need to get div element that you need using ID constructed via their logic.

Edit: So best bet is to use nested dictionary or some other container to retrieve info.
 
Back
Top