• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Formula Challenge 001 - Return everything in string after first block of numbers

jeffreyweir

Active Member
Challenge name:

Return everything in string after first block of numbers


Challenge Description:

  • * This is a formula challenge, designed to get you thinking creatively about how to solve a tricky problem using formulas only.

    * Split the strings in Column A with a formula that only returns the substring that you see in column B. So for instance, if the string is Monaco7190Australia1484 then we want everything to the right of that first block of numbers i.e. Australia1484.

    * Your formula should handle any length of string, and any length of numbers - not just the lengths shown below

    * You should not use any helper cells, intermediate formulas, named ranges, or VBA

    * The shortest formula I've found to do this is 139 characters, including the = sign.

    * To find the length of your formula, put an apostrophe in front of the = sign, and then point a LEN function at it.(Note that the LEN function doesn't count the apostrophe)
Challenge Data


https://www.dropbox.com/s/kvl48vto4hzurm3/Return%20everything%20in%20string%20after%20first%20block%20of%20numbers%2020130625.xlsb


Space-delimited data below:

A: B:
Monaco7190Australia1484 Australia1484
Liechtenstein4352Austria8590 Austria8590
Malta1306Belarus7578 Belarus7578
Iceland8130Belgium244 Belgium244
Luxembourg2796Bulgaria2691 Bulgaria2691
Slovenia8287Canada2019 Canada2019
Latvia1303Croatia5528 Croatia5528
Croatia9766CzechRepublic210 CzechRepublic2108
Estonia19Denmark6285 Denmark6285
Lithuania7737Estonia6840 Estonia6840
Norway734EuropeanUnion3206 EuropeanUnion3206
Switzerland7183Finland6004 Finland6004
Ireland5103France8873 France8873
NewZealand3014Germany9846 Germany9846
Portugal1621Greece2441 Greece2441
Denmark4206Hungary5624 Hungary5624
Finland1541Iceland8954 Iceland8954
Slovakia8715Ireland7149 Ireland7149
Sweden9235Italy2395 Italy2395
Austria1067Japan1407 Japan1407
Greece7601Latvia4011 Latvia4011
Hungary6089Liechtenstein4034 Liechtenstein4034
Bulgaria2367Lithuania7857 Lithuania7857
Belarus6113Luxembourg1444 Luxembourg1444
Belgium4754Malta2496 Malta2496
Turkey3846Monaco7385 Monaco7385
CzechRepublic4916Netherlands9261 Netherlands9261
Netherlands2684NewZealand6907 NewZealand6907
Spain5464Norway2554 Norway2554
Romania3475Poland8874 Poland8874
Australia2471Portugal661 Portugal661
Italy5510Romania7422 Romania7422
France399RussianFederation5895 RussianFederation5895
Poland5176Slovakia1841 Slovakia1841
Canada5428Slovenia9651 Slovenia9651
UnitedKingdomofGreatBritainandNorthernIreland3415Spain3559 Spain3559
Ukraine4083Sweden9443 Sweden9443
Germany1287Switzerland8602 Switzerland8602
Japan9813Turkey5622 Turkey5622
RussianFederation7115Ukraine9180 Ukraine9180
EuropeanUnion6864UnitedKingdomofGreatBritainandNorthernIreland3518 UnitedKingdomofGreatBritainandNorthernIreland3518
 

Hui

Excel Ninja
Staff member
Jeff


Place a single ` before and after a block of text will maintain Tabs etc

the character next to the 1 top row of a keyboard, not next to the enter key


Refer: http://chandoo.org/wp/2011/11/04/fancy-posts-using-html-display-codes/
 

NARAYANK991

Excel Ninja
Hi Jeff ,


This is a technique I read about on MrExcel or somewhere in the Google Groups discussion ; it is certainly nowhere near your 139 char formula , but I think it does the job.


=RIGHT(A16,LEN(A16)-MATCH(TRUE,(ISNUMBER(VALUE(MID(A16&" ",ROW(INDIRECT("1:"&LEN(A16))),1)))*NOT(ISNUMBER(VALUE(MID(A16&" ",ROW(INDIRECT("2:"&LEN(A16)+1)),1))))*ROW(INDIRECT("1:"&LEN(A16))))>0,0))


entered as an array formula , using CTRL SHIFT ENTER.


The only point in its favour is that it is easy to explain.


It does not take care of certain situations , such as when there is no digit in the input string , or if the numbers are decimal numbers.


Narayan
 

jeffreyweir

Active Member
Thanks Hui. Nice one Narayank991. Originally I had something similar inspired by http://office.microsoft.com/en-us/excel-help/extracting-numbers-from-alphanumeric-strings-HA001154901.aspx


At the above link I learnt that you can do away with the VALUE bit of your function simply by multiplying by 1, which coerces numbers stored as text into numbers. i.e.


=RIGHT(A16,LEN(A16)-MATCH(TRUE,(ISNUMBER(1*MID(A16&" ",ROW(INDIRECT("1:"&LEN(A16))),1))*NOT(ISNUMBER(1*MID(A16&" ",ROW(INDIRECT("2:"&LEN(A16)+1)),1)))*ROW(INDIRECT("1:"&LEN(A16))))>0,0))


...which is a neat trick.


Not that my revised formula uses it.
 

NARAYANK991

Excel Ninja
Hi Jeff ,


Thanks for the correction ; I just saw that the last part *ROW(INDIRECT("1:"&LEN(A16))) is also not necessary , since we will anyway get either TRUE or FALSE from the other two parts.


With your correction , and this exclusion , the length is now 157 chars.


Narayan
 

Sajan

Excel Ninja
Hi Jeff,

How about the following formula, with a length of 134 characters:

=MID(A1,MODE(MMULT(((N(ISNUMBER(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)+0))={1,0})*(ROW(INDIRECT("1:"&LEN(A1)))-{1,2}+1)),{1;1}))+1,255)


Cheers,

Sajan.
 

jeffreyweir

Active Member
Well played, sir. You beat me by one character. As with all your crazy approaches, I'm going to have to preemptively take some aspirin, then try to get my head around this puppy. Mine uses a very different approach (i think). Will share a bit later...don't want to preempt what creative approaches others come up with.


Thanks Sajan.
 

jeffreyweir

Active Member
Sajan: worth noting that these all do the same:

=ROW(INDIRECT("1:"&LEN(A17)))

=ROW(OFFSET(A1,,,LEN(A17)))

=ROW(A1:INDEX(A:A,LEN(A17)))


That 2nd one Hui just showed me. That 3rd one is the only non-volatile option.
 

Colin Legg

Active Member
Trimmed Sajan's formula down to 129 characters by removing the +1, changing the +0 to -, removing a pair of unnecessary brackets.


Code:
=MID(A1,MODE(MMULT((N(ISNUMBER(-MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)))={1,0})*(ROW(INDIRECT("1:"&LEN(A1)))-{0,1}),{1;1}))+1,255)

Of course, shorter doesn't necessarily mean better... :)

That 3rd one is the only non-volatile option.
The 3rd one is what Charles Williams calls 'semi-volatile'.
 

jeffreyweir

Active Member
Good skills, Colin. Although that 255 at the end needs to be replaced with LEN(A1) in the spirit of this being able to handle any string length.


Does semi-volatile means recalc only on open?
 

jeffreyweir

Active Member
yeah, the 255 was in Sajan's genes ;-)

I can live with a recalc on open...especially in a very large model.


Insightful as always to step through Sajan's formula:

These bold parts:

=MID(A16,MODE(MMULT((N(ISNUMBER(MID(A16,ROW(OFFSET(A$1,,,LEN(A16))) ,,1)+0))={1,0})*(ROW(OFFSET(A$1,,,LEN(A16))) -{1,2}+1),{1;1}))+1,LEN(A17))

…evaluate to this:

=MID(A16,MODE(MMULT((N(ISNUMBER(MID(A16,

{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}

,1)+0))={1,0})*({

1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}

-{1,2}+1),{1;1}))+1,LEN(A17))


And then this bold part:

=MID(A16,MODE(MMULT((N(ISNUMBER(

MID(A16,{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23},1)

+0))={1,0})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))

…evaluates to this:

=MID(A16,MODE(MMULT((N(ISNUMBER(

{"M";"o";"n";"a";"c";"o";"7";"1";"9";"0";"A";"u";"s";"t";"r";"a";"l";"i";"a";"1";"4";"8";"4"}

+0))={1,0})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))


And then this +0 bit on the end of this bold bit:

=MID(A16,MODE(MMULT((N(ISNUMBER(

{"M";"o";"n";"a";"c";"o";"7";"1";"9";"0";"A";"u";"s";"t";"r";"a";"l";"i";"a";"1";"4";"8";"4"}

+0
))={1,0})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))

…coerces any numbers stored as text into numbers, while causing the text bits to throw errors:

=MID(A16,MODE(MMULT((N(ISNUMBER({#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;7;1;9;0;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;1;4;8;4}
))={1,0})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))


And then this bold bit:

=MID(A16,MODE(MMULT((N(ISNUMBER({#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;7;1;9;0;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;1;4;8;4})
)={1,0})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))

…evaluates to this:

=MID(A16,MODE(MMULT((N({FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE}
)={1,0})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))

And then this bold bit:

=MID(A16,MODE(MMULT((N({FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE})
={1,0})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))

… gets teased into numbers by the N:

=MID(A16,MODE(MMULT( ({0;0;0;0;0;0;1;1;1;1;0;0;0;0;0;0;0;0;0;1;1;1;1}
={1,0})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))

And then this bold bit:

=MID(A16,MODE(MMULT(({0;0;0;0;0;0;1;1;1;1;0;0;0;0;0;0;0;0;0;1;1;1;1}={1,0}
)*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))

…creates a 2D array – one the inverse of the other – that converts True and False to 1 and zero (or the inverse of that if it’s the other array)

=MID(A16,MODE(MMULT(({FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))

And then this bold bit:

=MID(A16,MODE(MMULT( ({FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE})*({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23}-{1,2}+1),{1;1}))+1,LEN(A17))

…evaluates to this:

=MID(A16,MODE(MMULT({0,0;0,1;0,2;0,3;0,4;0,5;7,0;8,0;9,0;10,0;0,10;0,11;0,12;0,13;0,14;0,15;0,16;0,17;0,18;20,0;21,0;22,0;23,0}
,{1;1}))+1,LEN(A17))

…which that MMULT turns into this:

=MID(A16,MODE({0;1;2;3;4;5;7;8;9;10;10;11;12;13;14;15;16;17;18;20;21;22;23})+1,LEN(A17))

…which leaves us with this:

=MID(A16,10+1,LEN(A17))
 

jeffreyweir

Active Member
Oh dear, I seem to be repeating myself bigtime. That's what happens when you play with formulas after dark. WAY after dark...past midnight here and time for bed.
 

Sajan

Excel Ninja
Thanks Jeff for the inaugral formula challenge and the write up!


Thanks Colin for optimizing the formula. I have to remember the usage of "-" instead of "+0"... great tip!


I hope someone else suggests yet another approach for extracting the substring. To quote bobhc "there is more than one way to skin a cat"!!


Eagerly awaiting the next challenge! :)


-Sajan.


P.S. Hi Jeff, Could you include a prefix such as "Formula Challenge" in the subject line? I nearly overlooked this one since we see questions about extracting substrings all the time.


I will also post a challenge later this evening.
 

jeffreyweir

Active Member
'Formula Challenge' handle a good idea. Although you're like a bloodhound when it comes to sniffing out a challenge ;-)


Note that Hui did add a 'Formula Challenge' category, so that's where you want to post it.
 

Haseeb A

Active Member
Hello Sajan, Good one...!


Hello Jeff,


If first character, after first block of numbers is an Alphabet (A-Z) & If you are on XL 2007 or later here is one with 108 character :)


=MID(A1,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A1,MIN(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},A1),""))+1),"")),250)


Edit:


If you are on Excel 2010 or later you could use it WITHOUT CSE with AGGREGATE


=MID(A1,AGGREGATE(15,6,SEARCH(CHAR(ROW($65:$90)),A1,AGGREGATE(15,6,FIND({1,2,3,4,5,6,7,8,9,0},A1),1)),1),999)
 

jeffreyweir

Active Member
Haseeb's formula is really clever because it finds the position that the first number occupies - meaning it ignores the first block of text - and then it starts searching for the text from the position that the first number occupies. And because it's looking for text, it ignores numbers. Ingenious.


Stepping through Haseeb's formula - after replacing the 250 bit at the end with LEN(A1) so that it handles any length string (as per formula challenge notes), we get this:

=MID(A1,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A1,MIN(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},A1),""))+1),"")),LEN(A1))


This bit in bold:

=MID(A1,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A1,MIN(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},A1) ,""))+1),"")),LEN(A1))

…evaluates to this:

=MID(A1,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A1,MIN(IFERROR({8,#VALUE!,#VALUE!,21,#VALUE!,#VALUE!,7,22,9,10},""))+1),"")),LEN(A1))


This bit in bold:

=MID(A1,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A1,MIN(IFERROR({8,#VALUE!,#VALUE!,21,#VALUE!,#VALUE!,7,22,9,10},""))+1),"")),LEN(A1))

…evaluates to this:

=MID(A1,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A1,MIN({8,"","",21,"","",7,22,9,10})+1),"")),LEN(A1))


This bit in bold:

=MID(A1,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A1, MIN({8,"","",21,"","",7,22,9,10})+1),"")),LEN(A1))

…evaluates to this:

=MID(A1,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A1, 8),"")),LEN(A1))


This bit in bold:

=MID(A1,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90) ),A1,8),"")),LEN(A1))

…evaluates to this:

=MID(A1,MIN(IFERROR(SEARCH(CHAR({65;66;67;68;69;70;71;72;73;74;75;76;77;78;79;80;81;82;83;84;85;86;87;88;89;90}),A1,8),"")),LEN(A1))


This bit in bold:

=MID(A1,MIN(IFERROR(SEARCH(CHAR({65;66;67;68;69;70;71;72;73;74;75;76;77;78;79;80;81;82;83;84;85;86;87;88;89;90}),A1,8),"")),LEN(A1))

…evaluates to this:

=MID(A1,MIN(IFERROR(SEARCH({"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";"P";"Q";"R";"S";"T";"U";"V";"W";"X";"Y";"Z"}
,A1,8),"")),LEN(A1))


This bit in bold:

=MID(A1,MIN(IFERROR(SEARCH({"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";"P";"Q";"R";"S";"T";"U";"V";"W";"X";"Y";"Z"}
,A1,8),"")),LEN(A1))

…evaluates to this:

=MID(A1,MIN(IFERROR({11;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;18;#VALUE!;#VALUE!;17;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;15;13;14;12;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!}
,"")),LEN(A1))


This bit in bold:

=MID(A1,MIN(
IFERROR({11;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;18;#VALUE!;#VALUE!;17;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;15;13;14;12;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!},"")
),LEN(A1))

…evaluates to this:

=MID(A1,MIN({11;"";"";"";"";"";"";"";18;"";"";17;"";"";"";"";"";15;13;14;12;"";"";"";"";""}
),LEN(A1))


This bit in bold:

=MID(A1, MIN({11;"";"";"";"";"";"";"";18;"";"";17;"";"";"";"";"";15;13;14;12;"";"";"";"";""})
,LEN(A1))

…evaluates to this:


=MID(A1, 11
,LEN(A1))
 

Shaun

Member
Hi All


I was working on a method to find the position of the first capital letter starting from the right in the string rather than the end of the first set of numbers. This is as far as I got:


Code:
=MID(A35,MIN(FIND(CHAR(ROW(INDIRECT("65:90"))),A35&"ABCDEFGHIJKLMNOPQRSTUVWXYZ")),1)

At this stage all it does is find the first capital letter from the left. I must confess that I cannot take any credit for any part of this formula...Mr Google helped there. Actually I was just working out how it worked before I tried to manipulate it or possibly refine it.


Cheers
 

jeffreyweir

Active Member
I was working on a method to find the position of the first capital letter starting from the right ...that's a good idea for an approach.
 

Shaun

Member
Hi Jeff


Thanks, I'll keep plugging away at it. I enjoy these sorts of challenges, I get a lot out of it.


I have just glanced through your explanation above and that explains, for the most part, the formula I am using. Thanks


Cheers
 

jeffreyweir

Active Member
Not redundant at all, Shaun. I had a crack just now using your approach of finding the first position of the first capital letter starting from the right.


Here's how I did it (note that these are all array formulas)


This bit reverses the string in A16:

=MID(A16,LEN(A16)+1-ROW(A$1:INDEX(A:A,LEN(A16))),1)


And we can wrap that in a CODE funciton to return the character number:

=CODE(MID(A16,LEN(A16)+1-ROW(A$1:INDEX(A:A,LEN(A16))),1))


And then we can check whether those numbers fall within the range 65 to 90 by using this:

=CODE(MID(A16,LEN(A16)+1-ROW(A$1:INDEX(A:A,LEN(A16))),1))=COLUMN($BN:$CM)

...although the problem is that this creates a 2d array.


But if we multilpy that 2d array by integers from 1 to the length of the original string, and then find the smallest number (excluding zeros), we know where abouts the first capital falls within our reversed string:

=SMALL((CODE(MID(A16,LEN(A16)+1-ROW(A$1:INDEX(A:A,LEN(A16))),1))=COLUMN($BM:$CL))*(ROW(A$1:INDEX(A:A,LEN(A16)))),LEN(A16)*26-1)


And we can then use that info to split our original string up properly with this:

=MID(A16,1+LEN(A16)-SMALL((CODE(MID(A16,LEN(A16)+1-ROW(A$1:INDEX(A:A,LEN(A16))),1))=COLUMN($BM:$CL))*(ROW(A$1:INDEX(A:A,LEN(A16)))),LEN(A16)*26-1),LEN(A16))


But that is very long. Mabye Sajan can find another way with MMULT to handle the 2d array or avoid it altogether.
 

jeffreyweir

Active Member
And here was my original approach:

=MID(A16,1+MATCH(1,(CODE(MID(A16,ROW(A$1:INDEX(A:A,LEN(A16))),1))<58)*(CODE(MID(A16&"a",ROW(A$2:INDEX(A:A,LEN(A16)+1)),1))>57),0),LEN(A16))
 

Shaun

Member
Oh Wow, so it was possible, just not <139 characters


I don't think I would have ever arrived at that. Especially the idea of reversing the string, let alone working out how to do it.


Cheers
 

jeffreyweir

Active Member
That's why these formula challenges are so good...because there's sooo many possible ways to do things in Excel, everybody learns something from everybody.


Now my trick is to incorporate some of the crazy stuff that Sajan uses into my approach if I can, and see if I can shave off even more characters.
 
Top