• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Creating absolute links from parsed ones

Btw, i used the term "filter" cause if i can make such scraper then i will sift through thousands of "hrefs" and keep only a few for further use.
 
Post #8 :​
If pattern match "^about:/" then replace it with appropriate absolute link for each site.

Post #12 :​
... like I wrote. Replace 'about:/' with root directory/path

FYI - instead of SPLIT function, use REPLACE function.

Post # 19 :​
So when you read a link starting with "about:/" just replace it
by the correct domain or base of URL you already wrote on a paper
when you manually observed the webpage.

Post #21 :​
Or just use Replace VBA function

Post #25 :​
Unclear … No needs to filter but to replace "about:" !
So I can't see any R E P L A C E within your code ‼ :eek:

Code:
Sub Demo4Noob()
    For Each V In [{"http://siteA","about:siteB","http://siteC"}]
        Debug.Print Replace(V, "about:", "http://")
    Next
End Sub
 
You can't see any "replace" function in my code cause I've already learnt from you how to do that. My main concern from the very beginning is the first portion as i said in my earlier post. For more clarity: if i run my scraper using this "https://chandoo.org/forum/threads/creating-absolute-links-from-parsed-ones.35873/page-2#post-218423" link I expect my scraper will filter only the portion that I can further use, as in "https://chandoo.org" in this case. Then i will think of concatenation. Thanks.
 
Simple string manipulation isn't it?
Replace first 3 occurrence of "/" with some character not used in URL, then replace first 2 with "/", then split by the character (Or alternately, replace first 2 with some character, split by "/" and then replace the character with "/").
Code:
Sub Demo()
    Dim storage As String, arr() As String, items

    storage = "https://www.yify-torrent.org/search/1080p/," & _
            "https://yts.ag/browse-movies,https://www.houzz.com/professionals," & _
            "https://www.wiseowl.co.uk/videos/"
 
    arr() = Split(storage, ",")

    For Each items In arr
        x = Split(Replace(Replace(items, "/", "@", 1, 3), "@", "/", 1, 2), "@")
        Debug.Print x(0)
    Next
End Sub
 
Or using INSTR function... since "https://" is 8 char in length... start search position at 9th character for 3rd "/".
Code:
Sub Demo()
    Dim storage As String, arr() As String, items

    storage = "https://www.yify-torrent.org/search/1080p/," & _
            "https://yts.ag/browse-movies,https://www.houzz.com/professionals," & _
            "https://www.wiseowl.co.uk/videos/"
 
    arr() = Split(storage, ",")

    For Each items In arr
        x = Left(items, InStr(9, items, "/") - 1)
        Debug.Print x
    Next
End Sub
 
@sir chihiro, Where have you been sir? It just solves the main part. I think i should go for instr function (your second solution) cause this one is cleaner approach. Ain't it possible to create some if statement so that it can cover both "http" and "https". I'm trying this part myself. One more thing sir: could you provide me with any link that can lead me to a regex tutorial? I've already started it but few things are very much unclear and as i do not wanna bother you so expecting any link. Thanks a lot sir, for your effective solution.
 
It doesn't matter if it's http or https. No url will have 3rd "/" at 9th position or before...
Ex: "http://a.ca/" I doubt anything shorter exist...

As well, first code is more flexible than Instr version.

could you provide me with any link that can lead me to a regex tutorial?
Hmm? I gave you one already in post #14
 
Sir, please take a look at the below code. I think few unnecessary likes are there in my code but it serves the purpose. I check randomly the newly produced links and found them valid. Really appreciate your help. Btw, I was so worried with this piece of coding that I could not notice you have already provided me with the desired link. I'll surely give here an update why I did all this.
Code:
Sub Creating_absolute_links()
    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim post As Object, vault As Variant, link As Variant

    vault = Array( _
        "https://yts.ag/browse-movies/", _
        "https://chandoo.org/wp/vba-classes/", _
        "https://www.wiseowl.co.uk/videos/")

    For Each link In vault
        With http
            .Open "GET", link, False
            .send
            html.body.innerHTML = .responseText
        End With

        For Each post In html.getElementsByTagName("a")
            If InStr(link, "http:") > 0 Then
                x = Left(link, InStr(8, link, "/") - 1)
                Else: x = Left(link, InStr(9, link, "/") - 1)
            End If
            If InStr(post.href, "about:/") > 0 Then r = r + 1: Cells(r, 1) = x & Split(post.href, "about:")(1)
        Next post
    Next link
End Sub
 
Last edited:
Having applied what I've got instructed so far the script gets the look as below:
Code:
Sub Link_parser()
    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim post As Object, link_var As Variant, link As Variant

    link_var = Array( _
        "http://spltech.in/", _
        "http://www.unifrostindia.com/", _
        "http://advanta.in/", _
        "http://www.superrefrigerations.com/", _
        "http://www.greenplanet.in/")

    For Each link In link_var
        With http
            .Open "GET", link, False
            .send
            html.body.innerHTML = .responseText
        End With

        For Each post In html.getElementsByTagName("a")
            If InStr(link, "http:") > 0 Then x = Left(link, InStr(8, link, "/") - 1)
            If InStr(1, post.innerText, "contact", 1) > 0 Then refined_links = post.href: Exit For
'            If InStr(1, refined_links, "about:", 1) > 0 Then
'                R = R + 1: Cells(R, 1) = x & Split(refined_links, "about:")(1)
'            Else:
'                R = R + 1: Cells(R, 1) = refined_links
'            End If
        Next post
        Debug.Print refined_links
    Next link
End Sub

The link I'm getting:
Code:
about:contactus.aspx
http://www.unifrostindia.com/contactus
about:contact.html
about:contactus.htm
http://www.greenplanet.in/contact-us.htm

The scraper only extracts the link containing "contact". However, the problem I'm facing at this moment is: if you look at the populated results above it has got three links with the word "about" in it's starting position out of five. I tried but failed to make any if statement to look for "about" in the link, if it exists then it will follow the concatenation part (which is already defined but commented out) otherwise it will print the link as it is. Thanks
 
Last edited:
Thanks for your comment Marc L. Now, I know how to concatenate links in the right way. However, what I tried to mean is I would like to make a "if statement" with two conditions.

Condition 1: Checking for links containing "about:". If it is there then "refined_links" will be processed through concatenation. (I'll take care of this concatenation)

Condition 2: If there is no "about" in the link then "refined_links" will be printed through Else statement.

Once again: what I can't do is create a if statement with two conditions.
 
Thanks Marc L, for your suggestion and demo. And yes, following post#28 i got my desired results. However, in the first place I misunderstood you. You could have clarified it a little bit more when you were dealing with a novice like me. Thanks a lotttttttttttttt once again.
 
Gonna create a new thread to publish the scraper I was after and to achieve that I prolonged this thread so far. Thanks to everyone for stretching your helping hand in every steps.
 
It's just about R E A D I N G several same advice given since post #8 !

Post #28 is the super clarification summarize
from posts of Chihiro & mines !
At very beginner level, can't be clearer than « try R E P L A C E » ‼
Just by reading and observing the result of #28 code demonstration …

If you do not want to try, no matter, I won't give any other way
and I'll move to next thread …

How showing something to someone who doesn't want to open its eyes ?‼

Try first and then, if really needed, ask for a clarification …
 

No offense, it was just my point of view after your
« You could have clarified it a little bit more » …​
 
Back
Top