• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

String manipulation: Expecting a better solution

shahin

Active Member
Hi there! I've made a parser which is pulling e-mail but it is crude. I've refined it already but the code looks messy. Is it the right way I'm doing or any better solution? Thanks in advance.

Crude One: mailto:ben%40holysmokesbbq.com.au?subject=Enquiry%2C%20sent%20from%20yellowpages.com.au&body=%0A%0A%0A%0A%0A------------------------------------------%0AEnquiry%20via%20yellowpages.com.au%0Ahttps%3A%2F%2Fwww.yellowpages.com.au%2Fvic%2Fballarat-central%2Fholy-smokes-barbecue-1000001786433-listing.html%3Fcontext%3DbusinessTypeSearch%26isTopOfList%3Dtrue%26premiumProductId%3D400005055323

Refined One: ben@holysmokesbbq.com.au

Here is the code:

Code:
Set topics = html.getElementsByClassName("contact contact-main contact-email")

For Each topic In topics
    On Error Resume Next
    v = Split(topic.href, "?")
    Cells(x, 1) = Replace(Replace(v(LBound(v)), "mailto:", ""), "%40", "@")
    x = x + 1
Next topic
 
Personally, I'd use UrlUnescape function from Shell Lightweight Utility functions to convert escape sequences back into regular characters.
https://msdn.microsoft.com/en-us/library/windows/desktop/bb773791(v=vs.85).aspx

You can see how it's used in post#6 in link below.
http://www.excelforum.com/excel-for...017416-is-there-a-formula-to-decode-urls.html

You'd then use it like below...
Code:
For Each topic In topics
    On Error Resume Next
    v = Split(UnEscapeURL(topic.href), "?")
    v = Split(v(0), "mailto:")
    Cells(x, 1) = v(1)
    x = x + 1
Next topic

v(0) is used instead of LBound(v). Since lower bound of base 0 array "v" is always going to be 0.
v(1) since you'd always want the 2nd array element.
 
Thanks a lot both of you for your invaluable suggestions. It's always a great pleasure to have instructions from vba giants like you two in a single thread.
 
Dear Marc L, I have made it with "split and double replace function" but can't perform double split. Could you please show me how if it doesn't waste your time. Thanks in advance.

Code:
Set topics = html.getElementsByClassName("listing listing-search listing-data")

For i = 0 To topics.Length - 1
Set topic = topics(i)
    On Error Resume Next
    v = Split(Replace(Replace(topic.getElementsByClassName("contact contact-main contact-email")(0).href, "mailto:", ""), "%40", "@"), "?")
    Cells(x, 1) = v(LBound(v))
    x = x + 1
Next i
 
But both my code, the first one and the last one are faulty. I noticed just now that it also fills in the value irrespective of it is empty. I meant, if A,B,C have values but C,E,F have not and Again G,H,I have values then the function i used always fills in the value of an empty cell with its previous value.
 
That's because of "On Error Resume Next"

When error occurs, "x = x + 1" isn't performed and does not skip over to next cell.
 
can't perform double split
Split of Split is Chrüterchraft !​
Code:
Sub Demo1()
       Dim oJS As Object, S$
       Set oJS = CreateObject("ScriptControl")
           oJS.Language = "JScript"
           oJS.AddCode "function decode(txt) {return decodeURIComponent(txt);}"
             S = "mailto:ben%40holysmokesbbq.com.au?subject=Enquiry"
    MsgBox oJS.Run("decode", Split(Split(S, "mailto:")(1), "?")(0))
       Set oJS = Nothing
End Sub
 
@Marc L
OMG!!!! What a demo!!!!!!!!!! This (0),(1) portion used in the end of a split function confused me so many days. It is clear to me now. Thanks a lot.
 
In case if "mailto:" can be missing you must reverse
Split order like Chihiro's way and test if index #1 exists :​
Code:
Sub Demo2()
    Dim oJS As Object, V
    Set oJS = CreateObject("ScriptControl")
        oJS.Language = "JScript"
        oJS.AddCode "function decode(txt) {return decodeURIComponent(txt);}"
              V = "mailto:ben%40holysmokesbbq.com.au?subject=Enquiry"
              V = Split(Split(V, "?")(0), "mailto:")
    If UBound(V) > 0 Then MsgBox oJS.Run("decode", V(1))
    Set oJS = Nothing
End Sub
As you can see On Error Resume Next is useless …
 
Finally, I got a workaround inspired by both of you. It is working fine now. No duplicates at all.

Code:
Sub ausdata()
Const url = "https://www.yellowpages.com.au/search/listings?clue=coffee+shop&locationClue=all+states&lat=&lon=&selectedViewMode=list"
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, topic As HTMLHtmlElement
Dim x As Long, i As Long, v As Variant

x = 2

http.Open "GET", url, False
http.send
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("contact contact-main contact-email")
For Each topic In topics
    Cells(x, 1) = Replace(Split(Split(topic.href, "mailto:")(1), "?")(0), "%40", "@")
    x = x + 1
Next topic
End Sub
 
It also does the same:
Code:
For Each topic In topics
    Cells(x, 1) = Replace(Replace(Split(topic.href, "?")(0), "mailto:", ""), "%40", "@")
    x = x + 1
Next topic
 
Back
Top