Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 1.)

Posted on July 16th, 2013 in Formula Challenges , Posts by Jeff - 40 comments

The Formula Challenge Series: Where good formulas go GREAT:

Formula Challenge 001 – Return everything from a string after first block of numbers

Introduction to the Formula Challenge series:

Formulas are like the DNA of Excel…you can solve some very complicated evolutionary challenges by stringing together a few simple base-pairs in the right order.

Over at http://chandoo.org/forums/forum/excel-challenges you’ll find a whole bunch of tricky formula challenges posted by a group of excel nerds (me included) trying to out-nerd each other with formula nerdiness.

Consequently, these formula challenges are designed to get participants thinking creatively about how to solve a tricky problem using formulas only.

(Oh, and they might be designed to get a whole bunch of nerds to do my tricky work assignments for me too.)

[Psst….so as to not hurt any feelings, I’m obliged to point out under the UN’s declaration of Super-Human Rights that one person who posted a response at this challenge is not an Excel Nerd, thus allowing everybody who posted to assume it was them.]

[Psst (again)…of course, in actual fact that lone non-nerd was in fact moi - but don’t tell them that…it will spoil their delusions of normality ;-) ]

Putting together a formula to tackle these challenges shows you have a great mastery of excel. Putting together a formula that tackles the challenge and is also shorter than anyone else’s is something more akin to standing on the gold podium at the Olympics and shouting “In yer face, LOSERS” down to the silver and bronze teams while you strut your best Jagger strut to the Windows 95 launch song. http://www.youtube.com/watch?v=5VPFKnBYOSI

(Yes, it’s actually that satisfying to a true Excel nerd. Or so I’m told.)

Anyway….I’ve learned a lot from looking at how different people tackle these, and my formula toolkit is much bigger as a result.

To help you increase your toolkit, today we’re going to take a look at the first challenge in this series:

 

Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 1).

Without further ado, here’s the details of the first challenge:

  • Split the strings in Column A with a formula that only returns the substring that you see in column B. So for instance, if the string is Monaco7190Australia1484 then we want everything to the right of that first block of numbers i.e. Australia1484.
  • Your formula should handle any length of string, and any length of numbers – not just the lengths shown below
  • You should not use any helper cells, intermediate formulas or named ranges, or VBA

Those strings look something like this:

String: Required Substring:
Monaco7190Australia1484 Australia1484
Liechtenstein42Austria128590 Austria128590
Malta6Belarus757812 Belarus757812

 

You can download the challenge and full dataset here: Formula Challenge 1.1

The bit that’s required to be returned is the Blue Bold bit i.e. the block of text and numbers after the first block of text and numbers, which in the case of Monaco7190Australia1484 is Australia1484.

What’s tricky about this challenge is that we have a block of text, followed by a block of numbers, followed by the block of text/number that we actually want to extract. So it’s hard to pinpoint the location of that 2nd block of text/numbers given there’s a preceding block of text/numbers.

Despite having a head start of several weeks on this challenge (I first conceived of the approach when answering a post a weeks before posting the problem as a challenge) and despite (or perhaps because of) taking copies amounts of performance-enhancing drugs, I didn’t even make it onto that Olympic podium. Instead, I arrived huffing and puffing at the finish line with this unwieldy beast:

=MID(A1,1+MATCH(1,(CODE(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1))<58)*(CODE(MID(A1&”a”,ROW(A$2:INDEX(A:A,LEN(A1)+1)),1))>57),0),LEN(A1))

The approach I took was to try to find the point at which we have a number immediately followed by a letter.

For instance, the zero and letter A from Monaco7190Australia1484. To do this:

1. I dynamically break the string apart into one-digit bites with this:

= (MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)

Excel returns: ={“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″}

 

2. Wrap a code function around that which converts these to the specific number that Excel uses to represent each number/letter:

=CODE({“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″})

Excel returns: ={77;111;110;97;99;111;55;49;57;48;65;117;115;116;114;97;108;105;97;49;52;56;52}

 

3. Work out which of those codes represents numbers.

ie: All numbers fall before character code 58

={77;111;110;97;99;111;55;49;57;48;65;117;115;116;114;97;108;105;97;49;52;56;52}<58

Excel returns: ={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE}

 

4. Repeat steps 1 to 3 but this time I check for the occurrence of letters.

Furthermore I make a change so that the resulting array is offset by one position, by changing the ROW(A$1… bit in 1 to ROW(A$2… :

=CODE(MID(A1&”a”,ROW(A$2:INDEX(A:A,LEN(A1)+1)),1))>57

Excel returns: ={TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE}

 

5. Multiply these two arrays together

This converts these TRUE and FALSE numbers to Ones and Zeros.

Furthermore, because that 2nd array is offset by one, this gives me the position where a letter and a number coincide. (Which – given that 2nd array has been offset – is equivalent to identifying the occurrence of where a number is immediately followed by a letter:

={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE} * {TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE}

Excel returns: ={0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;1}

 

6. All I need to do then is to identify where that first 1 falls in this array:

=MATCH(1,{0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;1},0)

Excel returns: =10

 

7. …and then return all of the original string to the right of that position:

=MID(A1,1+10,LEN(A1))

Excel returns: =Australia1484

Summary

Neat, eh?

Well, I thought so.

In fact, I thought that would be a sure-fire winner of the challenge.

But that didn’t even get me the Bronze. Rather, that honour belongs to Sajan, who scooped up the consolation prize with this little beauty:

=MID(A1,MODE(MMULT((N(ISNUMBER(-MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A1)))-{0,1}),{1;1}))+1,LEN(A1))

…which requires you to pre-emptively take a whole bottle aspirin AND get a good nights’ sleep before trying to wrap your head around it.

So go take that aspirin, take your pens out of your pocket, and hit the pillow, and I’ll see you back here this time tomorrow for some Einsteinium formula relativity as we tackle this gem of an approach in Part 2.

 

About the Author.

Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:

=NOT(EVEN(PROPER(OR(RIGHT(TODAY())))))

That’s right, pure #VALUE!

Find out more at http://www.heavydutydecisions.co.nz

 

Comments from Hui:

Jeff has been a valuable contributor at Excel Hero Academy during the past two years and recently started posting at the Chandoo.org Forums

This is Jeff’s First post at Chandoo.org and I am sure you will welcome his verbose narrative to these interesting Formula Challenges

Please leave comments and feedback and alternative solutions in the comments below.

Your email address is safe with us. Our policies

Written by Hui...
Tags: , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

40 Responses to “Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 1.)”

  1. jeffreyweir says:

    Thanks Hui. This was actually my 2nd post. First is at http://chandoo.org/wp/2009/07/24/medicare-chart-critique/

    Looking forward to doing more.

  2. Chris Byham says:

    Hi Jeff,

    Nice post – I looked at the problem first and then looked through your solution, and I’m afraid to say mine was almost the same as yours, apart from using ISNUMBER instead of CODE as per formula below:

    =MID(A1,MATCH(1,ISNUMBER(1*MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1))=TRUE)*ISNUMBER(1*MID(A1,ROW(A$1:INDEX(A:A,LEN(A1)))+1,1))=FALSE),0)+1,LEN(A1))

    which, whilst a little longer in length than the code version, can be read more easily than the ‘Code’ option.

    Look forward to more posts. And thanks for highlighting the Challenges forum too – whilst being a long-time reader of Chandoo’s blog, I seem to have somehow overlooked this – but know where I’m going to be spending a lot of time now!!

  3. jeffreyweir says:

    Good skills, Chris. I’m the same as you…long time reader of this blog but somehow have not managed to spend much time on the Forum. But I tell ya what…I’ve learned as much from the Forum in the last month as I’ve learned in the last year on my own.

    There’s three more posts to come on this challenge alone, so stay tuned.

    Regards

    Jeff

  4. Chandoo says:

    Excellent write up Jeff…. Welcome back to the guest author spot. I am eager to read remaining parts of this and other challenges. I found the questions in forum really interesting and quite challenging.

    • jeffreyweir says:

      Thanks Chandoo. It’s a bit overdue…I promised you a few guest posts back when your twins were born!

    • Kutty says:

      Hai Chandoo,I have an error in my formula during the sixth steps doing..
      I have a formula
      =CODE(MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1))>57
      *CODE(MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1))<58

      but i got All are True ..

  5. Elias says:

    One more option.

    =MID(A4,MATCH(1,(CODE(MID(A4,ROW($1:$255),1))>57)*(ROW($1:$255)>MIN(FIND(ROW($1:$10)-1,A4&57321^2))),0),255)

    Regards

    • jeffreyweir says:

      Hi Elias. Unfortuntanely it fails on the Your formula should handle any length of string, and any length of numbers – not just the lengths shown above/below count.

      • Elias says:

        Hi Jeff, the formula can handles any length of numbers and 255 characters string, but if we change the 255 for 32,767 it will handle the max number of characters that a cell can contain.

        =MID(A4,MATCH(1,(CODE(MID(A4,ROW($1:$32767),1))>57)*(ROW($1:$32767)>MIN(FIND(ROW($1:$10)-1,A4&57321^2))),0),32767)

        Regards

        • jeffreyweir says:

          Good stuff, Elias. Your formula can be made even shorter:
          =MID(A1,MATCH(1,(CODE(MID(A1,ROW(A:A),1))>57)*(ROW(A:A)>MIN(FIND(ROW($1:$10)-1,A1&57321^2))),0),32767)

          …although those whole-column references make it really really resource intensive!

          • jeffreyweir says:

            And shorter again with this:
            =MID(A1,MATCH(1,(CODE(MID(A1,ROW(A:A),1))>57)*(ROW(A:A)>MIN(FIND(ROW(A:A)-1,A1&57321^2))),0),8^5)

            …and could go shorter still if you can find a shorter way to write a similar digit to 57321^2 that includes all digits from 0 to 9.

            That is a masterstroke, by the way…what a great way to ensure you will always get a match for each number. How did you come up with that concept? It’s not one I’ve seen before.

          • jeffreyweir says:

            Ahh, I see they are called pandigital numbers. Freakin’ awesome.

          • jeffreyweir says:

            Here’s a shorter way to write a pandigital number than 57321^2
            7^18

            …which means we can tighten that puppy up even more:
            =MID(A1,MATCH(1,(CODE(MID(A1,ROW(A:A),1))>57)*(ROW(A:A)>MIN(FIND(ROW($1:$10)-1,A1&7^18))),0),9^9)

          • Elias says:

            Hi Jeff, I’m glad you liked the formula. I used to use A1&1234567890 to avoid the error when finding/searching for numbers from 0 to 9, but I learned the pandigital way from Roberto Mensa in the ExcelHero LinkedIn group.

            Regards

  6. Sammy.Adade says:

    Realized both blocks of numbers are four-digit. Hence came up with this:
    =MID(A6,MATCH(TRUE,ISNUMBER(–MID(A6,ROW(INDIRECT(“1:”&LEN(A6))),1)),0)+4,LEN(A6)-MATCH(TRUE,ISNUMBER(–MID(A6,ROW(INDIRECT(“1:”&LEN(A6))),1)),0)+3)

  7. [...] you’ll keenly recall from yesterday’s gripping episode of “When good formulas go GREAT”, Sajan claimed Bronze in our inaugural formula challenge at Formula Challenge 001 – Return [...]

  8. juanito says:

    Thanks for the interesting challenge, Jeff – coffee was a little more fun this morning!
    Well, I came up with a solution very similar to Chris’s. In a sense all solutions will be directionally similar, I believe, in that we must test for position n where n is the first ocurrence of *n is digit* and *n+1 is text* – this (I think inevitably) requires multiplying two boolean arrays

    =MID(A1;MATCH(1;ISNUMBER(MID(A1;ROW(A1:INDEX(A:A;LEN(A1)));1)+0)*NOT(ISNUMBER(MID(A1;ROW(A1:INDEX(A:A;LEN(A1)))+1;1)+0));0)+1;LEN(A1)) (array-entered)

    - Juanito

  9. Matt Healy says:

    For this sort of task, I run my data through Perl so I can use its powerful one-liners. Sometimes as I am slicing and dicing my data, I’ll make multiple round trips between Excel and Perl.

  10. Chris Rowley says:

    Hi All,

    This is my attempt……..It needs entering via ctrl + shift + enter and change A1 for whatever cell the data is in

    =RIGHT(A1,LEN(A1)-MIN(IFERROR(SEARCH({“A”,”B”,”C”,”D”,”E”,”F”,”G”,”H”,”I”,”J”,”K”,”L”,”M”,”N”,”O”,”P”,”Q”,”R”,”S”,”T”,”U”,”V”,”W”,”X”,”Y”,”Z”},A1,MIN(IFERROR((SEARCH({1,2,3,4,5,6,7,8,9},A1,1)),””)))-1,””)))

  11. Sartori says:

    This is my try. It assumes the second text block will always start with a capital letter:

    {=RIGHT(A1,1+LEN(A1)-SMALL(FIND(CHAR(ROW(INDIRECT(“65:90″))),A1&”ABCDEFGHIJKLMNOPQRSTUVWXYZ”,2),1))}

  12. ianamck says:

    To be perfectly honest I am way out of my depth here. But I follow the blog and learn. I read the forum responses (even helping if I can) and learn. I ask questions and get great responses and learn. I am reading this article and although like I said I am out of my depth I still continue to learn.

    pandigital numbers……….. cmon guys give me a chance here that’s something new to start reading about. My head hurts but looking forward to your next episode Jeff

  13. Chris R says:

    Jeff,

    Can you let me know if mine worked ok or if I missed a trick? I was unable to test it on the actual file so had to simulate date, do was just curious.

    Thanks
    Chris

  14. Elias says:

    I already posted this but look like it was unseen. This one can handle any character after the numbers and not only letters A-Z.

    =MID(A4,MATCH(TRUE,(MID(A4,ROW(A:A),1)*ISERR(-MID(A4,ROW(A:A)+1,1)))>0,)+1,9^9)

    Regards

  15. WWCII says:

    A5 contents: Liechtenstein4352Austria8590

    Formula:
    =SUBSTITUTE(TRIM(CLEAN(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A5,1,8),2,8),3,8),4,8),5,8),6,8),7,8),8,8),9,8),0,8),FIND(8888,SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A5,1,8),2,8),3,8),4,8),5,8),6,8),7,8),8,8),9,8),0,8),1)+4,88))),8888,””)&TRIM(CLEAN(RIGHT(A5,4)))

    Formula Output: Austria8590

  16. Elias says:

    I can reduce it to 76 characters length. It also does not get mess if users insert/delete a row/column.

    =MID(A4,MATCH(TRUE,MID(A4,ROW(A:A),1)*ISERR(-MID(A4,ROW(A:A)+1,1))>0,)+1,9^9)

    Regards

  17. JP Ronse says:

    Hi All,

    I’ve developed my own functions to do this by using regular expression. My formula looks as:

    =RegExp_Extract(A1,”([0-9]+)([A-Z].*$)”,”$2″)

    It would lead much to far to explain the use of RE’s (regular expressions) but is common use.

    Just as an example: =RegExp_Extract(A1;”([0-9]+)([A-Z]+)([0-9]+$)”;”$2″)

    extracts “Australia”. The functions have even the option to use case sensitive comparisons.

    Wkr,

    JP Ronse

  18. GJ says:

    Nice post Jeff! Looking forward to reading more from you :)

  19. Hello everyone. I am new to the forum and wanted to see if I could at least think my way through the challenge to a solution I would hope could work for me. I would like some feedback on my solution. Where should I post my solution to the substring challenge?

  20. Carl says:

    Dear Chandoo, I am new to this post and I really like the responses every gave. I know I’m not that smart about excel but I think I can get around this. Please forgive me for rambling along. I’m retired and disabled and I thought I knew something about Excel but I don’t. I look froward to your help in the future too.

  21. Haz says:

    {=MID(A4,MAX(IFERROR(FIND(ROW(A$1:A$10)-1&(CODE(COL($A1:$Z1)+96)),A4),),1)+1,LEN(A4))}

    Since the string is always broken with a number followed by a letter, that’s what I seached for.
    The ROW(A$1:A$10)-1&(CODE(COL($A1:$Z1)+96)) part will create a matrix with all (260!) combinations from “0a” to “9z”, then you just have to look for it in the initial string and return the second part.

Leave a Reply