Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 1.)
The Formula Challenge Series: Where good formulas go GREAT:
Formula Challenge 001 – Return everything from a string after first block of numbers
Introduction to the Formula Challenge series:
Formulas are like the DNA of Excel…you can solve some very complicated evolutionary challenges by stringing together a few simple basepairs in the right order.
Over at http://chandoo.org/forums/forum/excelchallenges you’ll find a whole bunch of tricky formula challenges posted by a group of excel nerds (me included) trying to outnerd each other with formula nerdiness.
Consequently, these formula challenges are designed to get participants thinking creatively about how to solve a tricky problem using formulas only.
(Oh, and they might be designed to get a whole bunch of nerds to do my tricky work assignments for me too.)
[Psst….so as to not hurt any feelings, I’m obliged to point out under the UN’s declaration of SuperHuman Rights that one person who posted a response at this challenge is not an Excel Nerd, thus allowing everybody who posted to assume it was them.]
[Psst (again)…of course, in actual fact that lone nonnerd was in fact moi – but don’t tell them that…it will spoil their delusions of normality ]
Putting together a formula to tackle these challenges shows you have a great mastery of excel. Putting together a formula that tackles the challenge and is also shorter than anyone else’s is something more akin to standing on the gold podium at the Olympics and shouting “In yer face, LOSERS” down to the silver and bronze teams while you strut your best Jagger strut to the Windows 95 launch song. http://www.youtube.com/watch?v=5VPFKnBYOSI
(Yes, it’s actually that satisfying to a true Excel nerd. Or so I’m told.)
Anyway….I’ve learned a lot from looking at how different people tackle these, and my formula toolkit is much bigger as a result.
To help you increase your toolkit, today we’re going to take a look at the first challenge in this series:
Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 1).
Without further ado, here’s the details of the first challenge:
 Split the strings in Column A with a formula that only returns the substring that you see in column B. So for instance, if the string is Monaco7190Australia1484 then we want everything to the right of that first block of numbers i.e. Australia1484.
 Your formula should handle any length of string, and any length of numbers – not just the lengths shown below
 You should not use any helper cells, intermediate formulas or named ranges, or VBA
Those strings look something like this:
String:  Required Substring: 
Monaco7190Australia1484  Australia1484 
Liechtenstein42Austria128590  Austria128590 
Malta6Belarus757812  Belarus757812 
You can download the challenge and full dataset here: Formula Challenge 1.1
The bit that’s required to be returned is the Blue Bold bit i.e. the block of text and numbers after the first block of text and numbers, which in the case of Monaco7190Australia1484 is Australia1484.
What’s tricky about this challenge is that we have a block of text, followed by a block of numbers, followed by the block of text/number that we actually want to extract. So it’s hard to pinpoint the location of that 2nd block of text/numbers given there’s a preceding block of text/numbers.
Despite having a head start of several weeks on this challenge (I first conceived of the approach when answering a post a weeks before posting the problem as a challenge) and despite (or perhaps because of) taking copies amounts of performanceenhancing drugs, I didn’t even make it onto that Olympic podium. Instead, I arrived huffing and puffing at the finish line with this unwieldy beast:
=MID(A1,1+MATCH(1,(CODE(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1))<58)*(CODE(MID(A1&”a”,ROW(A$2:INDEX(A:A,LEN(A1)+1)),1))>57),0),LEN(A1))
The approach I took was to try to find the point at which we have a number immediately followed by a letter.
For instance, the zero and letter A from Monaco7190Australia1484. To do this:
1. I dynamically break the string apart into onedigit bites with this:
= (MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)
Excel returns: ={“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″}
2. Wrap a code function around that which converts these to the specific number that Excel uses to represent each number/letter:
=CODE({“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″})
Excel returns: ={77;111;110;97;99;111;55;49;57;48;65;117;115;116;114;97;108;105;97;49;52;56;52}
3. Work out which of those codes represents numbers.
ie: All numbers fall before character code 58
={77;111;110;97;99;111;55;49;57;48;65;117;115;116;114;97;108;105;97;49;52;56;52}<58
Excel returns: ={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE}
4. Repeat steps 1 to 3 but this time I check for the occurrence of letters.
Furthermore I make a change so that the resulting array is offset by one position, by changing the ROW(A$1… bit in 1 to ROW(A$2… :
=CODE(MID(A1&”a”,ROW(A$2:INDEX(A:A,LEN(A1)+1)),1))>57
Excel returns: ={TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE}
5. Multiply these two arrays together
This converts these TRUE and FALSE numbers to Ones and Zeros.
Furthermore, because that 2nd array is offset by one, this gives me the position where a letter and a number coincide. (Which – given that 2nd array has been offset – is equivalent to identifying the occurrence of where a number is immediately followed by a letter:
={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE} * {TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE}
Excel returns: ={0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;1}
6. All I need to do then is to identify where that first 1 falls in this array:
=MATCH(1,{0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;1},0)
Excel returns: =10
7. …and then return all of the original string to the right of that position:
=MID(A1,1+10,LEN(A1))
Excel returns: =Australia1484
Summary
Neat, eh?
Well, I thought so.
In fact, I thought that would be a surefire winner of the challenge.
But that didn’t even get me the Bronze. Rather, that honour belongs to Sajan, who scooped up the consolation prize with this little beauty:
=MID(A1,MODE(MMULT((N(ISNUMBER(MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A1))){0,1}),{1;1}))+1,LEN(A1))
…which requires you to preemptively take a whole bottle aspirin AND get a good nights’ sleep before trying to wrap your head around it.
So go take that aspirin, take your pens out of your pocket, and hit the pillow, and I’ll see you back here this time tomorrow for some Einsteinium formula relativity as we tackle this gem of an approach in Part 2.
About the Author.
Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:
=NOT(EVEN(PROPER(OR(RIGHT(TODAY())))))
That’s right, pure #VALUE!
Find out more at http://www.heavydutydecisions.co.nz
Comments from Hui:
Jeff has been a valuable contributor at Excel Hero Academy during the past two years and recently started posting at the Chandoo.org Forums
This is Jeff’s First post at Chandoo.org and I am sure you will welcome his verbose narrative to these interesting Formula Challenges
Please leave comments and feedback and alternative solutions in the comments below.
 
 

Leave a Reply
Never use simple numbers in your dashboards (bonus tip: how to fix default conditional formatting)  Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 2.) 
40 Responses to “Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 1.)”
Thanks Hui. This was actually my 2nd post. First is at http://chandoo.org/wp/2009/07/24/medicarechartcritique/
Looking forward to doing more.
Wow that was the day before my first visit and the day the forums started.
I hope the third doesn’t take as long as the second did.
Hi Jeff,
Nice post – I looked at the problem first and then looked through your solution, and I’m afraid to say mine was almost the same as yours, apart from using ISNUMBER instead of CODE as per formula below:
=MID(A1,MATCH(1,ISNUMBER(1*MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1))=TRUE)*ISNUMBER(1*MID(A1,ROW(A$1:INDEX(A:A,LEN(A1)))+1,1))=FALSE),0)+1,LEN(A1))
which, whilst a little longer in length than the code version, can be read more easily than the ‘Code’ option.
Look forward to more posts. And thanks for highlighting the Challenges forum too – whilst being a longtime reader of Chandoo’s blog, I seem to have somehow overlooked this – but know where I’m going to be spending a lot of time now!!
Good skills, Chris. I’m the same as you…long time reader of this blog but somehow have not managed to spend much time on the Forum. But I tell ya what…I’ve learned as much from the Forum in the last month as I’ve learned in the last year on my own.
There’s three more posts to come on this challenge alone, so stay tuned.
Regards
Jeff
Excellent write up Jeff…. Welcome back to the guest author spot. I am eager to read remaining parts of this and other challenges. I found the questions in forum really interesting and quite challenging.
Thanks Chandoo. It’s a bit overdue…I promised you a few guest posts back when your twins were born!
Hai Chandoo,I have an error in my formula during the sixth steps doing..
I have a formula
=CODE(MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1))>57
*CODE(MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1))<58
but i got All are True ..
One more option.
=MID(A4,MATCH(1,(CODE(MID(A4,ROW($1:$255),1))>57)*(ROW($1:$255)>MIN(FIND(ROW($1:$10)1,A4&57321^2))),0),255)
Regards
Hi Elias. Unfortuntanely it fails on the Your formula should handle any length of string, and any length of numbers – not just the lengths shown above/below count.
Hi Jeff, the formula can handles any length of numbers and 255 characters string, but if we change the 255 for 32,767 it will handle the max number of characters that a cell can contain.
=MID(A4,MATCH(1,(CODE(MID(A4,ROW($1:$32767),1))>57)*(ROW($1:$32767)>MIN(FIND(ROW($1:$10)1,A4&57321^2))),0),32767)
Regards
Good stuff, Elias. Your formula can be made even shorter:
=MID(A1,MATCH(1,(CODE(MID(A1,ROW(A:A),1))>57)*(ROW(A:A)>MIN(FIND(ROW($1:$10)1,A1&57321^2))),0),32767)
…although those wholecolumn references make it really really resource intensive!
And shorter again with this:
=MID(A1,MATCH(1,(CODE(MID(A1,ROW(A:A),1))>57)*(ROW(A:A)>MIN(FIND(ROW(A:A)1,A1&57321^2))),0),8^5)
…and could go shorter still if you can find a shorter way to write a similar digit to 57321^2 that includes all digits from 0 to 9.
That is a masterstroke, by the way…what a great way to ensure you will always get a match for each number. How did you come up with that concept? It’s not one I’ve seen before.
Ahh, I see they are called pandigital numbers. Freakin’ awesome.
Here’s a shorter way to write a pandigital number than 57321^2
7^18
…which means we can tighten that puppy up even more:
=MID(A1,MATCH(1,(CODE(MID(A1,ROW(A:A),1))>57)*(ROW(A:A)>MIN(FIND(ROW($1:$10)1,A1&7^18))),0),9^9)
Hi Jeff, I’m glad you liked the formula. I used to use A1&1234567890 to avoid the error when finding/searching for numbers from 0 to 9, but I learned the pandigital way from Roberto Mensa in the ExcelHero LinkedIn group.
Regards
Realized both blocks of numbers are fourdigit. Hence came up with this:
=MID(A6,MATCH(TRUE,ISNUMBER(–MID(A6,ROW(INDIRECT(“1:”&LEN(A6))),1)),0)+4,LEN(A6)MATCH(TRUE,ISNUMBER(–MID(A6,ROW(INDIRECT(“1:”&LEN(A6))),1)),0)+3)
@Sammy
Both blocks don’t have to be 4 characters long
It’s one of the criteria
[…] you’ll keenly recall from yesterday’s gripping episode of “When good formulas go GREAT”, Sajan claimed Bronze in our inaugural formula challenge at Formula Challenge 001 – Return […]
Thanks for the interesting challenge, Jeff – coffee was a little more fun this morning!
Well, I came up with a solution very similar to Chris’s. In a sense all solutions will be directionally similar, I believe, in that we must test for position n where n is the first ocurrence of *n is digit* and *n+1 is text* – this (I think inevitably) requires multiplying two boolean arrays
=MID(A1;MATCH(1;ISNUMBER(MID(A1;ROW(A1:INDEX(A:A;LEN(A1)));1)+0)*NOT(ISNUMBER(MID(A1;ROW(A1:INDEX(A:A;LEN(A1)))+1;1)+0));0)+1;LEN(A1)) (arrayentered)
– Juanito
For this sort of task, I run my data through Perl so I can use its powerful oneliners. Sometimes as I am slicing and dicing my data, I’ll make multiple round trips between Excel and Perl.
Hi All,
This is my attempt……..It needs entering via ctrl + shift + enter and change A1 for whatever cell the data is in
=RIGHT(A1,LEN(A1)MIN(IFERROR(SEARCH({“A”,”B”,”C”,”D”,”E”,”F”,”G”,”H”,”I”,”J”,”K”,”L”,”M”,”N”,”O”,”P”,”Q”,”R”,”S”,”T”,”U”,”V”,”W”,”X”,”Y”,”Z”},A1,MIN(IFERROR((SEARCH({1,2,3,4,5,6,7,8,9},A1,1)),””)))1,””)))
This is my try. It assumes the second text block will always start with a capital letter:
{=RIGHT(A1,1+LEN(A1)SMALL(FIND(CHAR(ROW(INDIRECT(“65:90″))),A1&”ABCDEFGHIJKLMNOPQRSTUVWXYZ”,2),1))}
Sartori – the problem with this is that it fails when the first country has two capital letters e.g. NewZealand.
To be perfectly honest I am way out of my depth here. But I follow the blog and learn. I read the forum responses (even helping if I can) and learn. I ask questions and get great responses and learn. I am reading this article and although like I said I am out of my depth I still continue to learn.
pandigital numbers……….. cmon guys give me a chance here that’s something new to start reading about. My head hurts but looking forward to your next episode Jeff
Given the complexity of explaining some of these approaches, my next episode is likely to be a psychotic one
Jeff,
Can you let me know if mine worked ok or if I missed a trick? I was unable to test it on the actual file so had to simulate date, do was just curious.
Thanks
Chris
Works fine, Chris. Good skills. In fact, that’s pretty much the approach that I cover in the next post at http://chandoo.org/wp/2013/07/19/formulachallenge001returneverythingfromastringafterthefirstblockofnumberspart3/ albeit with a shorter version of the formula.
Cheers Jeff, thanks for letting me know.
I look forward to reading the post.
[…] Jeff: Formula Challenge 001 – Part 1 […]
I already posted this but look like it was unseen. This one can handle any character after the numbers and not only letters AZ.
=MID(A4,MATCH(TRUE,(MID(A4,ROW(A:A),1)*ISERR(MID(A4,ROW(A:A)+1,1)))>0,)+1,9^9)
Regards
A5 contents: Liechtenstein4352Austria8590
Formula:
=SUBSTITUTE(TRIM(CLEAN(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A5,1,8),2,8),3,8),4,8),5,8),6,8),7,8),8,8),9,8),0,8),FIND(8888,SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A5,1,8),2,8),3,8),4,8),5,8),6,8),7,8),8,8),9,8),0,8),1)+4,88))),8888,””)&TRIM(CLEAN(RIGHT(A5,4)))
Formula Output: Austria8590
I can reduce it to 76 characters length. It also does not get mess if users insert/delete a row/column.
=MID(A4,MATCH(TRUE,MID(A4,ROW(A:A),1)*ISERR(MID(A4,ROW(A:A)+1,1))>0,)+1,9^9)
Regards
Hi All,
I’ve developed my own functions to do this by using regular expression. My formula looks as:
=RegExp_Extract(A1,”([09]+)([AZ].*$)”,”$2″)
It would lead much to far to explain the use of RE’s (regular expressions) but is common use.
Just as an example: =RegExp_Extract(A1;”([09]+)([AZ]+)([09]+$)”;”$2″)
extracts “Australia”. The functions have even the option to use case sensitive comparisons.
Wkr,
JP Ronse
Nice post Jeff! Looking forward to reading more from you
[…] Jeff: Formula Challenge 001 – Part 1 […]
Hello everyone. I am new to the forum and wanted to see if I could at least think my way through the challenge to a solution I would hope could work for me. I would like some feedback on my solution. Where should I post my solution to the substring challenge?
[…] Return everything from a string after the first block of numbers […]
Dear Chandoo, I am new to this post and I really like the responses every gave. I know I’m not that smart about excel but I think I can get around this. Please forgive me for rambling along. I’m retired and disabled and I thought I knew something about Excel but I don’t. I look froward to your help in the future too.
[…] http://chandoo.org/wp/2013/07/16/formulachallenge0011/ […]
{=MID(A4,MAX(IFERROR(FIND(ROW(A$1:A$10)1&(CODE(COL($A1:$Z1)+96)),A4),),1)+1,LEN(A4))}
Since the string is always broken with a number followed by a letter, that’s what I seached for.
The ROW(A$1:A$10)1&(CODE(COL($A1:$Z1)+96)) part will create a matrix with all (260!) combinations from “0a” to “9z”, then you just have to look for it in the initial string and return the second part.