• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Need to Extract String from <XXXXXXXX> angled bracket.

i need to extract the string available in <> these two symbol.

Sample Text look like below. i need a way to extract each string in coloum A then B then C.

Like for third row it should be

A1= <MeterSerial>CWE90778</MeterSerial>
B1 = MeterSerail
C1 = CWE90778
D1 = /MeterSerial
E1= blank as there is no further string

<AbsoluteReadings>
<functionNotSupported>
<MeterSerial>CWE90778</MeterSerial>
</AbsoluteReadingNode>
<AbsoluteReadingNode>
<MeterSerial>CWE91097</MeterSerial>
<ReadingTime>2015-11-12T00:00:00.000+01:00</ReadingTime>
<ReadingValue>7006.0</ReadingValue>
</AbsoluteReadingNode>
<AbsoluteReadingNode>
<MeterSerial>CWE91095</MeterSerial>
<ReadingTime>2015-11-12T00:00:00.000+01:00</ReadingTime>
<ReadingValue>9690.0</ReadingValue>
</AbsoluteReadingNode>


Regards,Kuldeep
 
Hi,


Try the below..


In B1 = TRIM(MID(SUBSTITUTE(SUBSTITUTE($A1," ",""),"/",""),2,SEARCH(">",SUBSTITUTE(SUBSTITUTE($A1," ",""),"/",""))-2))

In C1 =TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($A1," ",""),"/",""),"<",REPT(" ",LEN($A1)),2),SEARCH(">",SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($A1," ",""),"/",""),"<",REPT(" ",LEN($A1)),2))+1,LEN($A1)))

In D1 = IF(LEN($C1)>0,TRIM(SUBSTITUTE(RIGHT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($A1," ",""),"/",""),"<",REPT(" ",LEN($A1)),2),LEN($A1)),">","")),"")
 
Last edited:
Thanks a Lot for this Technique, I knew it but i never dared to use it.
"Formula Auditing"
764f112d-690d-4e8b-8276-02631c26a11c.gif


Still I will wait for the Logic, I know RegEx a bit and I am using that to Extract the texts.

Regards
Pratik Parmar
 
Hi Kuldeep, (and Asheesh :awesome: solutions dude)

Here is my try, based on the posted samples:

B1
=SUBSTITUTE(MID(A1,2,FIND(">",A1)-2),"/","")

C1
=IFERROR(LEFT(MID(A1,FIND(">",A1)+1,LEN(A1)),FIND("<",MID(A1,FIND(">",A1)+1,LEN(A1)))-1),"")

D1
=IF(LEN(C1)>0,IFERROR(SUBSTITUTE(SUBSTITUTE(MID(A1,FIND("</",A1)+1,LEN(A1)),"/",""),">",""),""),"")

Regards,
 
@Khalid NGO - thanks a lot

Just for the sake of Fun..

TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($A2,"<","!"),">","!"),"/",""),"!",REPT(" ",LEN($A2))),COLUMN(A$1)*LEN($A2),LEN($A2)))

Single formula

Put the formula in B1, Copy across and down..

Non-Array construction


Assumption – Your data starts from A1
 
Thank you @Asheesh The Formulas works perfectly ! I believe it's good for searching the XML Files.

Currently I use RegEx to extract the Alphabets from the AlphaNumeric Strings using the Microsoft Excel's In Built RegEx Library v5.5 along with a Customised Regex UDF ( =regex() Pls See it's Module's Code Below.

Example Images Names
DE5690_IMG_0815 (3)
DE5698-IMG_9599
DE5702-IMG_1782
DE5729_IMG_0615
DE5730_IMG_0096
DE5770_IMG_8189 (17)
DE5784-IMG_1556
DE5829-IMG_9637
DE5829-IMG_9766
DE5872-IMG_9059
DE5873-IMG_0989
DE5873-IMG_0990

Regular Expressions Macro for Microsoft Excel
(uses inbuilt library)
RegEx : [A-Za-z]{,2}
I Use http://www.regex101.com to test the RegEx Expressions

Regex.jpg


Code:
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
    Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
    Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
    Dim replaceNumber As Integer

    With inputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = matchPattern
    End With
    With outputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "\$(\d+)"
    End With
    With outReplaceRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With

    Set inputMatches = inputRegexObj.Execute(strInput)
    If inputMatches.Count = 0 Then
        regex = False
    Else
        Set replaceMatches = outputRegexObj.Execute(outputPattern)
        For Each replaceMatch In replaceMatches
            replaceNumber = replaceMatch.SubMatches(0)
            outReplaceRegexObj.Pattern = "\$" & replaceNumber

            If replaceNumber = 0 Then
                outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
            Else
                If replaceNumber > inputMatches(0).SubMatches.Count Then
                    'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
                    regex = CVErr(xlErrValue)
                    Exit Function
                Else
                    outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
                End If
            End If
        Next
        regex = outputPattern
    End If
End Function

I am Interested in making a Google Images Search type Interface to Search and View Images in the Microsoft Excel
 
Hi,

As requested – see here the explanation.

Note: I choose to explain the single formula solution instead of the formulae in comment no 2.

The example explained is <MeterSerial>CWE90778</MeterSerial> assume that the aforesaid string is in Cell A1.

Our entire construction is

Code:
B1= TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($A1,"<","!"),">","!"),"/",""),"!",REPT(" ",LEN($A1))),COLUMN(A$1)*LEN($A1),LEN($A1)))

Lets take the TRIM function to the shelf as of now since, we are more concerned to know how does the MID function extract the data from the above string.

MID function returns the characters from the text string basis the starting position and length of characters.

Let’s look at its arguments:

MID(text,start_num,num_chars)

So there are three arguments

  1. text: What is my input string(reference or hardcoded) in this case, it is

    Code:
    SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($A1,"<","!"),">","!"),"/",""),"!",REPT(" ",LEN($A1)))

  2. start_num: position in the string, i shall begin extracting from i.e. starting position. In this case

    Code:
    COLUMN(A$1)*LEN($A1)

  3. num_chars: number of characters i wish retrieve from the string, in this case

    Code:
    LEN($A1)
We shall now go step by step and understand each argument for the example in question.

text argument:

In this argument, we replace all the unnecessary special characters first then insert a special character of my choice where ever required.

Confused – let us decipher it to understand what do I mean

Code:
SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($A1,"<","!"),">","!"),"/",""),"!",REPT(" ",LEN($A1)))


Once evaluated becomes


Code:
SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE("<MeterSerial>CWE90778</MeterSerial>","<","!"),">","!"),"/",""),"!",REPT(" ",LEN($A1)))

So, we notice that it starts with the inner most SUBSTITUTE function and instructs it to replace less than character ”<” with an exclamation mark “!”. Once evaluated further the construction translates to

Code:
SUBSTITUTE(SUBSTITUTE(SUBSTITUTE("!MeterSerial>CWE90778!/MeterSerial>",">","!"),"/",""),"!",REPT(" ",LEN($A1)))

Now, the construction is instructed as such that it passes the return of the inner most SUBSTITUTE function to second SUBSTITUTE function, where it substitutes the greater than character “>” to exclamation mark “!” Upon evaluating, it becomes

Code:
SUBSTITUTE(SUBSTITUTE("!MeterSerial!CWE90778!/MeterSerial!","/",""),"!",REPT(" ",LEN($A1)))

By Now, we have removed all the greater than and less than signs with exclamation mark. Furthermore, I pass on the return to the next SUBSTITUTE function to replace the forward slash”/” with nothing and making it look like

Code:
SUBSTITUTE("!MeterSerial!CWE90778!MeterSerial!","!",REPT(" ",LEN($A1)))

Assuming that, you know how REPT & LEN functions work

I further use LEN function to return the total length of the original text i.e. in this case 35

So,
Code:
REPT(“ “, LEN($A1))= REPT(“ “,35)

Now, we use the combination of SUBSTITUTE and REPT function and instruct the construction to replace each exclamation mark with 35 spaces( 35 is specific to this case, will vary from case to case) thus

Code:
SUBSTITUTE("!MeterSerial!CWE90778!MeterSerial!","!",REPT(" ",35))
translates to

Code:
"  MeterSerial  CWE90778  MeterSerial  "

But, before I wind up the text argument, what do we notice here that every exclamation mark is replaced by 35 spaces.(remember)

So, this becomes our final string for the 1st argument of MID Function i.e. for text.

Now, is the second argument i.e. start_num

In this case, we definitely want to leave the first 35 blank spaces and then pick up text from the 36th character. Let us see our construction for start_num argument i.e.

Code:
COLUMN(A$1)*LEN($A1)

We know LEN($A1)= 35

COLUMN(A$1) returns 1

Similarly, once COLUMN(A$1) is dragged one column towards the right it becomes COLUMN(B$1) and is equal to 2..

So and so forth

Again, back to

Code:
COLUMN(A$1)*LEN($A1)= 1*35 = 35

Phew!!

Let us talk about the 3rd argument i.e. num_chars

i.e. LEN($A1) = 35

If combined all the above

Code:
MID(" MeterSerial  CWE90778  MeterSerial  ",{35},35)
and once evaluated returns

{" MeterSerial "}

So we finally, wrap it into the TRIM Function to remove all the extra spaces..

Code:
TRIM({" MeterSerial  "}) = “MeterSerial”

Similarly, when I drag it to the next column, in second argument of the MID function i.e COLUMN(B$1)*LEN($A1) becomes 2*35 = 70

So on and so forth..

Note: There is a huge scope to improve this construction, however, I did not bother much as it worked fine for all the examples mentioned in the initial post.

Hope this helps..!!
 
Last edited:
Hi all,

This one too, somewhat like @Asheesh formula

=TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE((IF(MID($A1,2,1)="/","<>"&$A1,IF(ISERROR(FIND("/",$A1)),$A1&"</>",$A1))),">","<"),"/",""),"<",REPT(" ",99)),1+99*(COLUMN(B2)-1),99))

Put in B1, drag to left and downwards.

NB" @Asheesh's formula is way shorter!! :thumbsup
 
Back
Top