• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract consective numbers from String

amit dani

New Member
Hi,

I have strings of data which contains both text and numbers. I want to extract 12 digit number. This 12 digit number can be anywhere in the string and there are other numeric digits also in the string. For example i have below string:-

xxx: 1200 Ref 01234567891234 yyy: 012345678912
xxx: 012345678912 Ref 01234567891234 yy -

I want to extract 012345678912 from above.

Any help will be highly appreciated..

Thanks in advance.
 
Hi Amit,
If your string always contains "Ref" followed by required number, you can use the following:
=MID(A2,FIND("Ref",A2,1)+4,12)

Regards,
 
Do you know the length of the number you want to extract? For your case it varies between 11 to 13 so any number greater then 11 and 13 be extracted?
 
Amit,
Can you post your question in excel file with more examples and expected results?
This will help users to help you.

Regards,
 
Hi.

Perhaps:

=MID(A2,MATCH(14,MMULT(ABS(ISNUMBER(0+MID(MID(A2,ROW(INDIRECT("1:"&LEN(A2)-13)),14),{1,2,3,4,5,6,7,8,9,10,11,12,13,14},1))-{1,0,0,0,0,0,0,0,0,0,0,0,0,1}),{1;1;1;1;1;1;1;1;1;1;1;1;1;1}),0)+1,12)

Regards
Outstanding...works like a charm..
Thanks
 
@XOR LX

Love the formula you gave, specially this part in it {1,0,0,0,0,0,0,0,0,0,0,0,0,1}

Very clever use to distinguish the number by giving it a boundary, and awesome use of MMULT.

Regards,
 
@amit dani You're welcome!
@Somendra Misra Thanks! Glad you like it!

@Asheesh
I'll try!

Let's take an abbreviated example to make the explanation easier to follow, i.e. the following string in A2:

Payment : 111531002246 Ref 00724101400019

This part:

MID(A2,ROW(INDIRECT("1:"&LEN(A2)-13)),14)

as you probably know, creates an array of all substrings of length 14 (I'll explain why 14, not 12, further down) from the string in A2, beginning with the first, second, third, etc. characters in that string, i.e.:

{"Payment : 1115";"ayment : 11153";"yment : 111531";"ment : 1115310";"ent : 11153100";"nt : 111531002";"t : 1115310022";" : 11153100224";": 111531002246";" 111531002246 ";"111531002246 R";"11531002246 Re";"1531002246 Ref";"531002246 Ref ";"31002246 Ref 0";"1002246 Ref 00";"002246 Ref 007";"02246 Ref 0072";"2246 Ref 00724";"246 Ref 007241";"46 Ref 0072410";"6 Ref 00724101";" Ref 007241014";"Ref 0072410140";"ef 00724101400";"f 007241014000";" 0072410140001";"00724101400019"}

The second MID then extracts, for each of the strings in the above array, each of the 14 individual characters within that string, so that:

MID(MID(A2,ROW(INDIRECT("1:"&LEN(A2)-13)),14),{1,2,3,4,5,6,7,8,9,10,11,12,13,14},1)

gives:

{"P","a","y","m","e","n","t"," ",":"," ","1","1","1","5";"a","y","m","e","n","t"," ",":"," ","1","1","1","5","3";"y","m","e","n","t"," ",":"," ","1","1","1","5","3","1";"m","e","n","t"," ",":"," ","1","1","1","5","3","1","0";"e","n","t"," ",":"," ","1","1","1","5","3","1","0","0";"n","t"," ",":"," ","1","1","1","5","3","1","0","0","2";"t"," ",":"," ","1","1","1","5","3","1","0","0","2","2";" ",":"," ","1","1","1","5","3","1","0","0","2","2","4";":"," ","1","1","1","5","3","1","0","0","2","2","4","6";" ","1","1","1","5","3","1","0","0","2","2","4","6"," ";"1","1","1","5","3","1","0","0","2","2","4","6"," ","R";"1","1","5","3","1","0","0","2","2","4","6"," ","R","e";"1","5","3","1","0","0","2","2","4","6"," ","R","e","f";"5","3","1","0","0","2","2","4","6"," ","R","e","f"," ";"3","1","0","0","2","2","4","6"," ","R","e","f"," ","0";"1","0","0","2","2","4","6"," ","R","e","f"," ","0","0";"0","0","2","2","4","6"," ","R","e","f"," ","0","0","7";"0","2","2","4","6"," ","R","e","f"," ","0","0","7","2";"2","2","4","6"," ","R","e","f"," ","0","0","7","2","4";"2","4","6"," ","R","e","f"," ","0","0","7","2","4","1";"4","6"," ","R","e","f"," ","0","0","7","2","4","1","0";"6"," ","R","e","f"," ","0","0","7","2","4","1","0","1";" ","R","e","f"," ","0","0","7","2","4","1","0","1","4";"R","e","f"," ","0","0","7","2","4","1","0","1","4","0";"e","f"," ","0","0","7","2","4","1","0","1","4","0","0";"f"," ","0","0","7","2","4","1","0","1","4","0","0","0";" ","0","0","7","2","4","1","0","1","4","0","0","0","1";"0","0","7","2","4","1","0","1","4","0","0","0","1","9"}

etc.

We then add zero to each of these, which will convert to numericals where appropriate, and then pass to ISNUMBER, giving:

{FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE;TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE;TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE}

Since we are interested in a string of 12 numbers, we need to ensure that we actually consider strings of 14 characters in length, in which the first and last characters are not numericals, and the rest are. Were we to only consider strings of length 12, we could not guarantee that any resulting strings of numericals did not in fact form part of a larger string of numericals.

One way to do this is to subtract the array constant:

{1,0,0,0,0,0,0,0,0,0,0,0,0,1}

from the above array, which, taking advantage of the fact that:

FALSE - 1 = -1

and

TRUE - 0 = 1

will ensure that, after taking the absolute of these resulting values, only a string which consists of non-numericals in the first and last position and numericals everywhere else will result in an array of 14 1s, i.e.:

{1,1,1,1,1,1,1,1,1,1,1,1,1,1}

After these operations, i.e. performing:

ABS(ISNUMBER(0+MID(MID(A2,ROW(INDIRECT("1:"&LEN(A2)-13)),14),{1,2,3,4,5,6,7,8,9,10,11,12,13,14},1))-{1,0,0,0,0,0,0,0,0,0,0,0,0,1})

we have:

{1,0,0,0,0,0,0,0,0,0,1,1,1,0;1,0,0,0,0,0,0,0,0,1,1,1,1,0;1,0,0,0,0,0,0,0,1,1,1,1,1,0;1,0,0,0,0,0,0,1,1,1,1,1,1,0;1,0,0,0,0,0,1,1,1,1,1,1,1,0;1,0,0,0,0,1,1,1,1,1,1,1,1,0;1,0,0,0,1,1,1,1,1,1,1,1,1,0;1,0,0,1,1,1,1,1,1,1,1,1,1,0;1,0,1,1,1,1,1,1,1,1,1,1,1,0;1,1,1,1,1,1,1,1,1,1,1,1,1,1;0,1,1,1,1,1,1,1,1,1,1,1,0,1;0,1,1,1,1,1,1,1,1,1,1,0,0,1;0,1,1,1,1,1,1,1,1,1,0,0,0,1;0,1,1,1,1,1,1,1,1,0,0,0,0,1;0,1,1,1,1,1,1,1,0,0,0,0,0,0;0,1,1,1,1,1,1,0,0,0,0,0,1,0;0,1,1,1,1,1,0,0,0,0,0,1,1,0;0,1,1,1,1,0,0,0,0,0,1,1,1,0;0,1,1,1,0,0,0,0,0,1,1,1,1,0;0,1,1,0,0,0,0,0,1,1,1,1,1,0;0,1,0,0,0,0,0,1,1,1,1,1,1,0;0,0,0,0,0,0,1,1,1,1,1,1,1,0;1,0,0,0,0,1,1,1,1,1,1,1,1,0;1,0,0,0,1,1,1,1,1,1,1,1,1,0;1,0,0,1,1,1,1,1,1,1,1,1,1,0;1,0,1,1,1,1,1,1,1,1,1,1,1,0;1,1,1,1,1,1,1,1,1,1,1,1,1,0;0,1,1,1,1,1,1,1,1,1,1,1,1,0}

In order to locate the position of the 14 1s (which I've highlighted) we can use MMULT, so that:

MMULT(ABS(ISNUMBER(0+MID(MID(A2,ROW(INDIRECT("1:"&LEN(A2)-13)),14),{1,2,3,4,5,6,7,8,9,10,11,12,13,14},1))-{1,0,0,0,0,0,0,0,0,0,0,0,0,1}),{1;1;1;1;1;1;1;1;1;1;1;1;1;1})

gives:

{4;5;6;7;8;9;10;11;12;14;12;11;10;9;7;7;7;7;7;7;7;7;9;10;11;12;13;12}

and so then we can simply locate the 14 in this array.

I haven't got time or space here to go into a technical explanation as to how matrix multiplication functions (which is precisely what MMULT does), though I've been meaning to write an article on this function over at my website for some time, so I'll try to do that over the next few weeks.

Hope that at least helps a bit!

Regards
 
Great work @XOR XL

I've noticed OP has 2 constant numbers:
1115
&
113

so i used the following simple formula:)
=IFERROR(MID(A2,FIND(1115,A2),12),MID(A2,FIND(113,A2),12))

Regards,
 
@Khalid NGO

Indeed, if your hypothesis is correct then yours is by far the better solution!

We are often faced with the dilemma of not having sufficient information as to the particulars of the data in question, and so providing a generalized solution, and forming our own hypotheses as to that data, and so providing a (generally simpler) tailored solution.

If your assumptions are correct, and so hold for all strings, then your "gamble" paid off. If not, then I was correct in "playing it safe" by offering a more complex, though generalized solution.

Cheers!
 
Hi,

@Khalid NGO Most of the numbers are have these pattern. But some numbers are still different.
@XOR LX Thanks for the explanation...I just noticed that the given formula is throwing an error if the required number is at end of the string.. For ex. in below case:

Payment : BEING AMOUNT GIVEN BACK TO PIN NO 111546928251
 
Hi,

@Khalid NGO Most of the numbers are have these pattern. But some numbers are still different.
@XOR LX Thanks for the explanation...I just noticed that the given formula is throwing an error if the required number is at end of the string.. For ex. in below case:

Payment : BEING AMOUNT GIVEN BACK TO PIN NO 111546928251

Ohh got it..since there is no space in the end it is showing error..
 
Hi @amit dani

It took some while to come up with something, but can you check this one in your original file, seems to work on all data provided till now.

=LOOKUP(2,1/(LEN(TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-"," "),"#"," "),":"," ")," ",REPT(" ",LEN($A2))),LEN($A2)*(ROW(INDIRECT("1:"&LEN($A2)))),LEN($A2))))=12),TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-"," "),"#"," "),":"," ")," ",REPT(" ",LEN($A2))),LEN($A2)*(ROW(INDIRECT("1:"&LEN($A2)))),LEN($A2))))

Unable to make shorter than @XOR LX :)

Regards,
 
@Somendra Misra

Are you checking for numericalness also here? Have you accounted for the fact that there could equally be non-numerical substrings of length 12 within the string? For example, if the string were:

Payment : Being Txn Cancelled Ref Num 01202111400116 Token No 1146 Pin Number 113537164181 Amount Refunded Accumulation

Regards
 
@XOR LX

Ha ha Nice Catch. Here is modified formula but more longer :(

=LOOKUP(2,1/((LEN(TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-"," "),"#"," "),":"," ")," ",REPT(" ",LEN($A2))),LEN($A2)*(ROW(INDIRECT("1:"&LEN($A2)))),LEN($A2))))=12)*(ISNUMBER(--TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-"," "),"#"," "),":"," ")," ",REPT(" ",LEN($A2))),LEN($A2)*(ROW(INDIRECT("1:"&LEN($A2)))),LEN($A2)))))),TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-"," "),"#"," "),":"," ")," ",REPT(" ",LEN($A2))),LEN($A2)*(ROW(INDIRECT("1:"&LEN($A2)))),LEN($A2))))

Regards,
 
@Somendra Misra

Perhaps, but now you face another issue, albeit one which is extremely unlikely to occur (though we should always strive to be rigorous, no?), and that is as a result of testing strings for numericalness by testing the entire string, rather than each individual character within that string.

For example, given the appropriate language settings, e.g. English, for the following string:

Payment : Being Txn Cancelled Ref Num 01202111400116 Token No 1146 Pin Number 113537164181 1January2012

your formula will return:

1January2012

since "1January2012" satisfies both the condition of being of length 12 characters and also of being considered, when coerced, a numerical by default.

Regards
 
Hi ,

Even leaving aside the technical correctness of otherwise of the formula suggested by Misra , my opinion is that as those who are striving for mastery of a language ( which Excel is after all ) , we should eschew verbosity for elegance.

Who would say the following ?

Even after the doomsday of nuclear warfare, when all the known signs of life will be swept off by electromagnetic radiations, a tiny familiar animal with twin antennal- flagella and jonted legs, though with faded exoskeleton, will come out from a weary hole on the face of a red-burnt cliff to be the new master of this once homo-sapiens dominated piece of the universe.

All that the above means is :

A cockroach can survive through a nuclear war.

Narayan
 
@Somendra Misra

Sure! But I'm sure that you yourself can construct some 12-character string in your language version which also happens to be e.g. a valid date string?

Regards
 
Back
Top