• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract words with capital letters using regex

YasserKhalil

Well-Known Member
Hello everyone
I have some strings that I need to manipulate .. I need to extract words that starts with capital letters only and the length of the word is greater than 1
Example :
2507 N 45th StSeattle WA 98103


This string would be "StSeattle WA" .. Numbers are excluded and the capital letter N is excluded too as it is of one letter in length
Thank you very much for help
 
Thanks a lot .. But I need regex as I think it will be faster in manipulating the data .. the data is too large
 
I have searched and found this pattern
Code:
([A-Z][\w-]*(\s+[A-Z][\w-]*)+)
But it doesn't work with all the instances ..
if the word is adjacent to number like that : 1542Hello >> I need to extract "Hello" too
and as for the result to be separated by space

Example:
H 1253Hello thanks for Help Shrivallabha

The desired output would be "Hello Help Shrivallabha"
 
@YasserKhalil

I am not an expert in Regex. In fact, I just learned the fundamentals today. But below pattern can match what you want. Of course, the output won't be separated by space or anything.

\d*([A-Z][A-Za-z]+)\b
give it a go and let us know if this works.
 
Yasser your last response has confused me. How does 1253Hello qualify as it begins with a numeric?

Please test below function:
Code:
Public Function FindCapWords(rng As Range) As String
Dim rgEx As RegExp
Dim matches, matitem
Set rgEx = CreateObject("VBScript.RegExp")
With rgEx
    .Pattern = "\s+[A-Z](\w)+"
    .Global = True '\\ Need to set this to true or else it will return one match
End With
Set matches = rgEx.Execute(" " & rng.Value)
If matches.Count <> 0 Then
    For Each matitem In matches
        FindCapWords = Trim(FindCapWords & " " & matitem)
    Next
End If
Set rgEx = Nothing
End Function
Usage would be:
=FindCapWords(A1)
Assuming test string in A1
 
For you post #4, see if below function helps:
Code:
Public Function FindCapWords(rng As Range) As String
Dim rgEx As RegExp
Dim matches, matitem
Set rgEx = CreateObject("VBScript.RegExp")
With rgEx
    .Pattern = "[A-Z](\w)+"
    .Global = True '\\ Need to set this to true or else it will return one match
End With
Set matches = rgEx.Execute(rng.Value)
If matches.Count <> 0 Then
    For Each matitem In matches
        FindCapWords = Trim(FindCapWords & " " & matitem)
    Next
End If
Set rgEx = Nothing
End Function
 
Thank you very much for great help
Just one instance remained .. look at this string "Hello1235" .. it is supposed to remove any numeric numbers either before or after the string

Another point if possible
Is it possible to change the pattern to remove duplicates ..
Example : please help Hello World123 Hello World5421" would be "Hello World"
with no numeric neither remove duplicate entries ..
 
Maybe like below:
Code:
Public Function FindCapWords(rng As Range) As String
Dim rgEx As Object 'RegExp
Dim oDic As Object
Dim matches, matitem
Set rgEx = CreateObject("VBScript.RegExp")
With rgEx
    .Pattern = "[A-Z][A-Za-z]+"
    .Global = True '\\ Need to set this to true or else it will return one match
End With
Set matches = rgEx.Execute(rng.Value)
If matches.Count <> 0 Then
    Set oDic = CreateObject("Scripting.Dictionary")
    For Each matitem In matches
        If Not oDic.Exists(CStr(matitem)) Then
            FindCapWords = Trim(FindCapWords & " " & matitem)
            oDic.Add CStr(matitem), CStr(matitem)
        End If
    Next
    Set oDic = Nothing
End If
Set rgEx = Nothing
End Function
 
Back
Top