• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Removing URL Strings

I've been using these wildcard searches to replace with nothing -blank-.


www*com

-or-

http*com


However, these strings I need to get rid of are much more complex (for instance sometimes .net or sometimes .org or sometimes in brackets or sometimes not starting out with www. or http at all).


I almost have the perfect solution.


There are 252 url domain extentions (or at least that I have a list of that I care to search for anyway). I've listed them all in a macro that runs through every possible combination of strings and removes them (without jeopardizing the integrity of text I wanna keep) -and also without jeopardizing the text in brackets I wanna keep as well.


However, my main problem are the strings that don't start with "http" or "www". Because of this, when I run my search and replace macro on URL strings with random start markers they result in:


before:

(urlname@such.com)


after:

(urlname@such


This is a problem of course because the only thing I could successfully remove was the ".com)"


Note: I could use (*), but I have text in other brackets that I would like to keep, and some of those brackets could potentially be sharing the same cell with the bad bracket sets. Also, they are not all at the end of the cell or the beginning (mixed locale).


If I could find a way to remove the brackets (*) that only contain the word ".com" or ".net" etc.., then


I will be able to knock the majority of them out (without removing the data in the brackets I wanna keep) because almost all of these URLs are in brackets.


So far my macro is a little over a thousand lines, because I have to be careful not delete a single letter of text I don't intend to. The search and replace has to be laser surgical sharp.


However, if anyone has a more quicker brilliant idea on how to remove them all the same way I am definitely taking notes.
 
Change the search to include the brackets maybe, like:

(*.com)

replaced with nothing


or perhaps even:

(*.???)


Which would replace anything found in brackets which ends with 3 characters preceded by a period (.net, .org., .com, etc)
 
@ Luke


If I use (*.com) or (*.???)


I will remove text I wanna keep so this actually won't work.

For example:


Before:

Here is (an example) line of text I wanna keep; however, (www.this_I_don't.com)


After:

Here is


By using the (*.???) I killed almost the entire line. It's a simple task, but then again it's kinda tricky (not really sure it can be done). Any other ideas?
 
Indi

You need to clearly define the rules that you want to use to enable selection/rejection of data


Can you post them?
 
If it Starts With:

http

www


And Ends With (view link):

http://www.iandmyself.me/all.possible.web.domain.extentions.txt


Then Delete


If it starts with http or www, and ends with any of those extentions, then delete.


But the problem is some links don't start with http or www, so I'm not sure how to kill the whole link if I can't find a starting point of the link name.
 
So you haven't fully defined the problem

Can you post some data and highlight what you want deleted and what you don't
 
Well It's a bunch of messy text. I'm not sure you would want me to paste it here. But as far as defining the problem.


If they were all isolated links I could delete them all. But they are so messy and inconsistent the only definitive pattern I have is the one above.
 
Indi...


Is all your data in an excel sheet or text based or database table


You can use Arrays for super-fast execution, at the same point of time having extensibility in maintaining the Prefix and Suffix Extension list to compare.


~Vijay
 
indi,

Could we state a rule of "all bracketed info that contains no spaces should be deleted"?

Or do you have some non-delete info that is a single word?
 
If previous is true, here's my attempt at a macro that currently loops through Range A2:A100 and removes all brackets info that does not contain spaces. Would take more lines, but you probably could adapt it to check all web extensions.


Code:
Sub KillLinks()

Dim StartWord As Boolean

Dim xDelete As Boolean

Dim NewString As String

Dim MyString As String

Dim BeginLetter As Long

Dim V(1 To 1000) As Variant


StartWord = False

xDelete = False


For Each c In Range("A1:A100") ' Define as needed

NewString = ""

MyString = c.Value

For i = 1 To Len(MyString)

V(i) = Mid(MyString, i, 1)


If V(i) = "(" Then 'Begin Check

StartWord = True

xDelete = True

BeginLetter = i

End If


'Has a space, don't delete

If V(i) = " " And StartWord = True Then xDelete = False


If V(i) = ")" Then

StartWord = False

If xDelete Then

For x = BeginLetter To i

V(x) = "" 'Remove string

Next x

End If

End If


Next i


For i = 1 To Len(MyString)

NewString = NewString & V(i)

Next i


c.Value = NewString

Next c

End Sub
 
Here is another example:

http://www.iandmyself.me/example.of.my.workbook.txt


@ Hui, Vjay, and Luke.. I appreciate you guys for working with me you all are great. I'm just not exactly sure how to ask what I'm needing so I posted the link above to give you an idea of what I see.


I know there is a pattern some way some how, I'm just not seeing, perhaps more than one step might be necessary.
 
@ Luke


How do I define the range to be searched to select A2 and down?

This is what I've tried beneath to no avail.


Sub KillLinks()

Dim StartWord As Boolean

Dim xDelete As Boolean

Dim NewString As String

Dim MyString As String

Dim BeginLetter As Long

Dim V As Variant

V = Range("a2", Range("a" & Rows.Count).End(xlDown))


StartWord = False

xDelete = False


For Each c In Range("a2", Range("a" & Rows.Count).End(xlDown))

NewString = ""

MyString = c.Value

For i = 1 To Len(MyString)

V(i) = Mid(MyString, i, 1)


If V(i) = "(" Then 'Begin Check

StartWord = True

xDelete = True

BeginLetter = i

End If


'Has a space, don't delete

If V(i) = " " And StartWord = True Then xDelete = False


If V(i) = ")" Then

StartWord = False

If xDelete Then

For x = BeginLetter To i

V(x) = "" 'Remove string

Next x

End If

End If


Next i


For i = 1 To Len(MyString)

NewString = NewString & V(i)

Next i


c.Value = NewString

Next c

End Sub
 
indi,

You'd want to use xlUp, not xlDown. Need to define the V variant properly as well...I'll throw in a function I found at http://www.rondebruin.nl/tips.htm that quickly lets us find the last row of any column we choose:


Code:
Function LastRow(x As Integer) As Integer

'Find the last used row in a Column: column A in this example

Application.Volatile

With ActiveSheet

LastRow = .Cells(.Rows.Count, x).End(xlUp).Row

End With

End Function


Sub KillLinks()

Dim StartWord As Boolean

Dim xDelete As Boolean

Dim NewString As String

Dim MyString As String

Dim BeginLetter As Long

Dim xRows As Long


xRow = LastRow(1) 'Find last row in column A (column 1)

Dim V(1 To xRow) As Variant


For Each c In Range("a2:a" & xRow)

NewString = ""

StartWord = False

xDelete = False

MyString = c.Value

For i = 1 To Len(MyString)

V(i) = Mid(MyString, i, 1)


If V(i) = "(" Then 'Begin Check

StartWord = True

xDelete = True

BeginLetter = i

End If


'Has a space, don't delete

If V(i) = " " And StartWord = True Then xDelete = False


If V(i) = ")" Then

StartWord = False

If xDelete Then

For x = BeginLetter To i

V(x) = "" 'Remove string

Next x

End If

End If


Next i


For i = 1 To Len(MyString)

NewString = NewString & V(i)

Next i


c.Value = NewString

Next c

End Sub
 
I think this macro (and function) is the money shot!


However, I'm getting a compile error...


Dim V(1 To xRow) As Variant ''Compile Error Here: Constant Expression Required
 
I've tried to fix the Dim V(1 To xRow) part a million ways, MS Excel Help is telling me that I'm trying to initialize a constant with a variable?


Stuck on this one. But as soon as someone helps me nail this one I can mark this one as solved.
 
Indi,


Try the below code...


xRow = LastRow(1) 'Find last row in column A (column 1)

Dim V() As Variant


ReDim Preserve V(1 To xRow) As Variant


HTH

~Vijay
 
@ vjay (and or) Luke,


One last thing guy :)


When searching for www or com.


How do I modify the Len (portion of the code) to search for begin check as 3 characters?


It appears I can only search for one character?
 
@indi

Correct, the current macro is only setup to look at one character at a time. The Len method is just used to determine how many characters are in the cell.


I suppose you could do add something like this:


Code:
TestWord = UCase(V(i) & V(i + 1) & V(i + 2))

If TestWord = "WWW" Or TestWord = "COM" Then

xDelete = True

End If

This would check every set of 3 characters for www or com. Note the UCase method is used since VB is case sensitive.
 
I read over this entire post and I noticed in my last post I said "one last thing guy".


I meant to say "one last thing guys" :) (plural)


I only thought I'd bring it up because "one last thing guy" makes me kinda sound like a jerk. lol


My apologies to Luke and Vjay ...not my intention at all (typo).


Anyway, back to the macro...


Luke I tried to plug in the test word portion you provided. Is there anyway you could paste it in this macro so I can see exactly how this is supposed to fit in? Because without being able to search for more than one character I'm dead in the water and stuck again.


I changed the UCase to LCase, but got stuck pretty much after that.


Things I've added to the macro:

'Has a space, don't delete

If V(i) = " " And StartWord = True Then xDelete = False

'Has a dot, then delete

If V(i) = "." And StartWord = True Then xDelete = True


...and now all I need to do is figure out how to:


'contains extention, then delete

If V(i) = "http" And StartWord = True Then xDelete = True

If V(i) = "www." And StartWord = True Then xDelete = True

If V(i) = ".com" And StartWord = True Then xDelete = True

If V(i) = ".net" And StartWord = True Then xDelete = True ...ETC (with all extentions)


Because without a little more specificity I'll keep wiping out accidental text.
 
indi,


Thanks for clearing up the typo, I had wondered about that. Here's the full macro with new code added in:


Function LastRow(x As Integer) As Integer

'Find the last used row in a Column: column A in this example

Application.Volatile

With ActiveSheet

LastRow = .Cells(.Rows.Count, x).End(xlUp).Row

End With

End Function


Sub KillLinks()

Dim StartWord As Boolean

Dim xDelete As Boolean

Dim NewString As String

Dim MyString As String

Dim BeginLetter As Long

Dim xRows As Long

Dim TestWord As String


xRow = LastRow(1) 'Find last row in column A (column 1)

Dim V(1 To xRow) As Variant


For Each c In Range("a2:a" & xRow)

NewString = ""

StartWord = False

xDelete = False

MyString = c.Value

For i = 1 To Len(MyString)

V(i) = Mid(MyString, i, 1)


If V(i) = "(" Then 'Begin Check

StartWord = True

xDelete = True

BeginLetter = i

End If


'Has a space, don't delete, unless it contains a period

If V(i) = " " And StartWord = True Then xDelete = False

If V(i) = "." And StartWord = True Then xDelete = True


'New section of macro

If i + 3 <= Len(MyString) Then

TestWord = UCase("*" & V(i) & V(i + 1) & V(i + 2) & V(i + 3) & "*")

'Keep adding on to first string as needed

If "HTTPWWW..COM.NET" Like TestWord And StartWord = True Then

xDelete = True

End If

End If


If V(i) = ")" Then

StartWord = False

If xDelete Then

For x = BeginLetter To i

V(x) = "" 'Remove string

Next x

End If

End If


Next i


For i = 1 To Len(MyString)

NewString = NewString & V(i)

Next i


c.Value = NewString

Next c

End Sub


Note that "Like" operator compares 1 string with the former. FOr efficiency, we can simply string all the words we want to look for (looking for strings of 4 characters long) and have it be compared to the test word. If there's a match, we delete.
 
@ Luke

No problem, hey guy and hey guys makes a world of difference. : )


This is beautiful thank you, but I have another question.

'Keep adding on to first string as needed

If "HTTPWWW..COM.NET" Like TestWord And StartWord = True Then

xDelete = True


Do I define the above code as?:...


'Keep adding on to first string as needed

If "http" Like TestWord And StartWord = True Then

xDelete = True


If "www." Like TestWord And StartWord = True Then

xDelete = True


If ".com" Like TestWord And StartWord = True Then

xDelete = True


If ".net" Like TestWord And StartWord = True Then

xDelete = True
 
No need to break it all out like that. Simply concatenate all the prefixes/suffixes that you are looking for into 1 giant word. Then you only need to do 1 "Like" comparison. Functionally, the section you wrote is that same. There's just no reason to repeat it all out like that.


Technically, if you want to be really secure, you could put all the words together with big gaps in between, like this:


If "HTTP______WWW.______.COM______.NET____all the other words" Like TestWord And StartWord = True Then

etc.
 
!!! SOLVED !!!


@ Luke, VJay, and Hui (Respectively ; )


I appreciate everyone's help with all of this, and I actually took bits and pieces from everyone's input. What I appreciate the most, is that I actually used everyone's advice to "lego" together a solution. Hui you woke me up to definitions, VJay you provided ReDim Preserve code which I desperately needed, and Luke you knocked it out and wrapped it up by expanding on the length search! :)


I've never written an Excel Grammy speech before how funny : )
 
Back
Top