Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 2.)

Posted on July 17th, 2013 in Formula Challenges , Posts by Jeff - 44 comments

Welcome back. Get a good night’s sleep? Great.

Hopefully you had freshly-picked brain-function-enhancing blueberries for breakfast, and a red bull or five, because you’re gonna need it. Why? Because today, my friends, we peek inside the multi-dimensional mind of Sajan the Excel Magician.

As you’ll keenly recall from yesterday’s gripping episode of “When good formulas go GREAT”, Sajan claimed Bronze in our inaugural formula challenge at Formula Challenge 001 – Return everything in string after first block of numbers

By way of a quick refresher, this formula challenge calls for a formula to return a substring from a longer string. Those strings look something like this:

String: Required Substring:
Monaco7190Australia1484 Australia1484
Liechtenstein42Austria128590 Austria128590
Malta130612Belarus8 Belarus8

You can download the challenge and full dataset here: Formula Challenge 1.2 (Excel 2007+ only, because the formulas have too many nested formulas for earlier Excel versions to handle)

Specifically, we need a formula to return only the second country and associated number …in the case of Monaco7190Australia1484 it’s that bit in bold i.e. Australia1484

Sajan the Magician split those strings with this award-winning beauty:

=MID(A1,MODE(MMULT((N(ISNUMBER(-MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A1)))-{0,1}),{1;1}))+1,LEN(A1))

How does this wizardly wonder work? Let’s find out…

1. Sajan splits apart the source string into it’s characters

This is as I did in the last post, albeit using a slightly different formula than I did:

=MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)

=MID(A1,{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23},1)

={“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″}

2. Then he adds zero to the array

This coerces any numbers stored as text into numbers, while causing the text bits to throw errors

={“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″} +0

={#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;7;1;9;0; #VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;1;4;8;4}

3. Wrap an ISNUMBER around this unsightly spawn of Satan,

This turns numbers to True, and turns those hideously unnatural errors of nature to better-behaved booleans: =ISNUMBER({#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;7;1;9;0;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;1;4;8;4})

={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE}

4. Turn this array into Ones and Zeros, and checks if it equals either One or Zero.

I know…me too. But let’s give him the benefit of the doubt for now, and see what rabbit hole he’s leading us down…we might yet pop out of a magicians’ hat:

=N({FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE})={1,0}

={0;0;0;0;0;0;1;1;1;1;0;0;0;0;0;0;0;0;0;1;1;1;1}={1,0}

={FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE}

What this has done in effect is create a 2D array – with one column of the array being the inverse of the other. So one column records whether something is a number, and the other records whether something isn’t a number. I know…me too. But maybe it would help if we saw what this would look like if entered over an Excel range, with our original string down the side so we can try and work out what spell this Wizard is whipping together:

FC021

 5. He then creates a second 2d array with this bit:

=(ROW(INDIRECT(“1:”&LEN(A16)))-{0,1})

={1,0;2,1;3,2;4,3;5,4;6,5;7,6;8,7;9,8;10,9;11,10;12,11;13,12;14,13;15,14;16,15;17,16;18,17;19,18;20,19;21,20;22,21;23,22}

…which if we were to enter over an Excel range would look like this:

FC022

6. He then throws this array into his wizard’s cauldron together with the array in 5 – along with generous portions of Eye of newt, toe of frog, wool of bat and tongue of dog. (This simply means Multiply the two previous Arrays)

This gives us the following:

={FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE}*{1,0;2,1;3,2;4,3;5,4;6,5;7,6;8,7;9,8;10,9;11,10;12,11;13,12;14,13;15,14;16,15;17,16;18,17;19,18;20,19;21,20;22,21;23,22}

={0,0;0,1;0,2;0,3;0,4;0,5;7,0;8,0;9,0;10,0;0,10;0,11;0,12;0,13;0,14;0,15;0,16;0,17;0,18;20,0;21,0;22,0;23,0}

…which again would look like this if entered over an Excel range (with our original string down the side by way of reference):

FC023

Well mess up my hair, and call me Einstein. Because now I see what all that was about. The only time that the same number in that first array also appears in that second array is that magical place where the first block of numbers ends, and the second block of letters begins. That is some serious sorcery!

7. Multiply the arrays

Next he conjures this 2D array into 1D, with the help of MMULT.

Microsoft tells us that MMULT “…returns the matrix product of two arrays. The result is an array with the same number of rows as array1 and the same number of columns as array2.”

If that’s got you scratching your head, then try this alternate explanation: The M in MMULT stands for Magic;-)

The way Sajan is using MMULT is to basically add across each row to find out what the total is for each row. Or another way of thinking about it is that because there’s ether a zero or a number in each row, he’s zipping up these two lists into one to just get the numbers and to ignore the zeros:
=MMULT({0,0;0,1;0,2;0,3;0,4;0,5;7,0;8,0;9,0;10,0;0,10;0,11;0,12;0,13;0,14;0,15;0,16;0,17;0,18;20,0;21,0;22,0;23,0},{1;1})

= {0;1;2;3;4;5;7;8;9;10;10;11;12;13;14;15;16;17;18;20;21;22;23}

8. Work out the most common number in that array and add one to it

Now he’s got the start position of our desired substring.

=MODE({0;1;2;3;4;5;7;8;9;10;10;11;12;13;14;15;16;17;18;20;21;22;23})+1

=11

9. Lastly, he uses this trickery to tease out our answer string

=MID(A16,MODE({0;1;2;3;4;5;7;8;9;10;10;11;12;13;14;15;16;17;18;20;21;22;23})+1,LEN(A16))

=MID(A16,11,LEN(A16))

=Australia1484

Austrailia1484. Crikey dick. Strewth, mate!

That bonza beaut has more bounce than a Boomer! Makes me feel a couple of tinnies short of a slab.

And that’s the formula that got the lowly Bronze! Tune in tomorrow, and we’ll see what the Silver-medal winner, Haseeb, has to offer. If you dare…..

 

About the Author.

Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:

=NOT(EVEN(PROPER(OR(RIGHT(TODAY())))))

That’s right, pure #VALUE!

Find out more at http:www.heavydutydecisions.co.nz

Your email address is safe with us. Our policies

Written by Hui...
Tags: , , , , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

44 Responses to “Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 2.)”

  1. David Hager says:

    How about? Am I too late to win :(

    =MID(A1,MAX(IFERROR(FIND((ROW(1:10)-1)&CHAR(COLUMN(BM:CL)),UPPER(A1)),0))+1,255)

    • jeffreyweir says:

      Also note that you don’t need the UPPER, although you do need to change ROW(1:10) to absolute references in order to copy the formula down. And you could replace that hard-coded 255 with say 9^9 to handle any length of string (although we only need a number of no more than 32767 on the end, so 9^9 is slightly overkill, albeit with no ill effect).

      By my account that makes it 76 characters. Well done.

    • Elias says:

      What if the next character after the first numbers is not between A-Z? Like Monaco7190@Australia1484

      Regards

      • David Hager says:

        In retrospect, the formula can be further simplified by making the CHAR array more comprehensive (i.e.
        CHAR(COLUMN(BF:DS))). Then, there would be no reason to convert A1 to UPPER(A1).

      • jeffreyweir says:

        Elias – that’s another challenge entirely. In this one – based on some question I answered in a forum somewhere recently – the sample data always has the letter A-Z after a number.

        • Elias says:

          I think this one handles the scenario I’m talking about, and still short.

          =MID(A4,MATCH(TRUE,(MID(A4,ROW(A:A),1)*ISERR(-MID(A4,ROW(A:A)+1,1)))>0,)+1,9^9)

          Regards

      • David hager says:

        =MID(A1,MAX(IFERROR(FIND(CHAR(COLUMN(BF:DR))&(ROW(1:10)-1),A1),0))+1,255)

        • jeffreyweir says:

          David…you’ve got to have the numbers first. Plus I changed 255 for 9^9 to handle strings longer than 255 characters:
          =MID(A1,MAX(IFERROR(FIND((ROW(1:10)-1)&CHAR(COLUMN(BF:DR)),A1),0))+1,9^9)

      • David Hager says:

        =MID(A1,MAX(IFERROR(FIND((ROW(1:10)-1)&CHAR(COLUMN(BF:DR)),A1),0))+1,255)

  2. jeffreyweir says:

    Ha…you spoiled my punch line, David. At the end of this series (a couple of posts of which are still to come) I was going to challenge readers to find a shorter solution than the respondents had so far posted, with a slightly longer version of yours in mind. Specifically =MID(A1,MIN(IFERROR(FIND(CHAR(COLUMN($AW:$BF))&CHAR(ROW($65:$90)),A1),””))+1,LEN(A1))

    Great skills. Be sure to check out the other challenges, and still check in to see the other innovative solutions that I’ll be covering re this challenge.

  3. Man your formula is insane ! All those Matrix operations !!

    Here my very simple solution

    ={MID(B12;MATCH(TRUE;ISNUMBER(–MID(B12;ROW(INDIRECT(“1:”&LEN(B12)));1));0)+4;100)}

    Regards.

    • jeffreyweir says:

      Hi ExcelStrategy. The formula challenge sets out that your formula should handle any length of string, and any length of numbers – not just the lengths shown above/below.

      Your formula is hard-coded to 4 digits, but as per the challenge we can’t rely on the numbers always being four digits.

      • Now I understand why you have isolated the only sequence of FALSE;TRUE (0;1) in the initial array! it signals the change between the numbers and text in the two blocks of strings !

        I have based my formula on the data wich was inside the example workbook that has a pattern of 4 numbers every time :)

    • Kevin says:

      this does not work for me.Can you please advise?

  4. simon says:

    Can’t you use the CODE function to get ASCII number greater than 64 related to second sequential number greater than 64 via vlookup or nested if or search or find functions?

    • jeffreyweir says:

      Simon: here’s what CODE returns on the whole string:
      ={77;111;110;97;99;111;55;49;57;48;65;117;115;116;114;97;108;105;97;49;52;56;52}
      …and here’s the ones that are greater than 64:
      ={TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE}
      How does that help?

      • jeffreyweir says:

        Sorry, I should elaborate further. You can use this:
        =CODE(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1))>64
        …to return this:
        ={TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE}

        And then we can search that array for the first FALSE with this:
        =MATCH(FALSE,CODE(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1))>64,0)
        …but then we’d need to find where the next TRUE falls relative to the length of the original string, so that we could split that original string. And that requires quite a bit more manipulation, which really blows out the length of our formula.

        The next post discusses a formula that does something similar to your suggested approach. But there’s not a MATCH or VLOOKUP in sight. Stay tuned ?

  5. David Hager says:

    Pair of parentheses not needed, final formula.

    =MID(A1,MAX(IFERROR(FIND(ROW(1:10)-1&CHAR(COLUMN(BF:DR)),A1),0))+1,9^9)

  6. Nikki says:

    Way too complicated for this little brain but love how your mind works, Chandoo – the wit!! :)

  7. sam says:

    @David – Brilliant…Sheer Genius….
    @Jeff Reminds me of the gem to search a exact word in a string from the academy from you and Craig.

  8. jeffreyweir says:

    @Sam…yeah, I remember you were standing and clapping A phase that I’ve now trademarked, so use it and I’ll sue your ass ;-)

  9. Valli! says:

    Am new to the Site. Spending hours together to understand the work – breaking my head.

    Could somebody help me to understand why we use “-” in formula (before 2nd MID formula and “-{0,1}) and what {1;1} does.

    MID(A6,MODE(MMULT((N(ISNUMBER(-MID(A6,ROW(INDIRECT(“1:”&LEN(A6))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A6)))-{0,1}),{1;1}))+1,LEN(A6))

    Thanks!
    Valli

  10. Chris Rowley says:

    Hi All,

    Just came across this………Thought I’d have a crack at it before reading anyone else’s responses, just to see what method my brain would come up with!

    This entered as an array formula (Ctrl + Shift +Enter) would do the trick, changing A1 for whatever cell needed

    =RIGHT(A1,LEN(A1)-MIN(IFERROR(SEARCH({“A”,”B”,”C”,”D”,”E”,”F”,”G”,”H”,”I”,”J”,”K”,”L”,”M”,”N”,”O”,”P”,”Q”,”R”,”S”,”T”,”U”,”V”,”W”,”X”,”Y”,”Z”},A1,MIN(IFERROR((SEARCH({1,2,3,4,5,6,7,8,9},A1,1)),””)))-1,””)))

  11. ianamck says:

    @Jeff

    Woah psychotic is most probably an understatement. Let me say this english is my first language and I have read and understood your brilliant explanation. I have even taken the time to test bits of your explanation of see what some of the functions are doing.

    What still befuddles me is how a number of you even start to approach this type of problem right from the beginning. Which functions to use and in what order to get to the solutions you are all showing.

    I used to just write very simple formulas that did one or two things and with this site I have come on leaps and bounds by studying the answers and explanations from lots of contributors. A few names always rise to the top (that includes your forum posts Jeff and those of the Forum Ninjas) I just want to get to the point where I can begin to approach a problem like this by understanding which of those functions I should use and how to nest them. Rather than just copying an answer and moving on. I know that will take time, practice and lots more learning and I am sure it will slowly come. Keep up the great explanations and remember us less gifted lovers of excel.

    Bring on episode three Jeff………………….. I can take it ;-)

    • Jeff Weir says:

      Thanks for your kind words. You raise a good point…how to even start formulating approaches such as this. I’ll probably write something up in due course along those lines. I can certainly say that there’s no way I’d ever have put together the approaches that some of the other respondents put together. That’s the beauty of a challenge like this…you get to see many expert cat-skinners in action.

      • ianamck says:

        Jeff a great idea

        “…..you raise a good point…how to even start formulating approaches such as this. I’ll probably write something up in due course along those lines………”

        Looking forward to that. Book me a front row seat.

        Maybe a series of posts from your good self and some of the Ninjas to start with????????????????

  12. Elias says:

    I already posted this but look like it was unseen. This one can handle any character after the numbers and not only letters A-Z.

    =MID(A4,MATCH(TRUE,(MID(A4,ROW(A:A),1)*ISERR(-MID(A4,ROW(A:A)+1,1)))>0,)+1,9^9)

    Regards

  13. Elias says:

    I can reduce it to 76 characters length. It also does not get mess if users insert/delete a row/column.

    =MID(A4,MATCH(TRUE,MID(A4,ROW(A:A),1)*ISERR(-MID(A4,ROW(A:A)+1,1))>0,)+1,9^9)

    Regards

  14. Jeanbar says:

    Hi,

    My 2 cents (76 chars), Excel 2003:
    =MID(A2;MATCH(2;MMULT(N(ISERR(-MID(A2;ROW($1:$99)+{0\1};1)));{1;2});0)+1;99)

    (French conventions:
    formula separator “;”
    {0\1}=column vector 0,1
    {1;2}=row vector 1,2)

  15. Jeanbar says:

    It is strange to see how complex data manipulation is with Excel when it is so simple with Word:

    1°) Copy the herebelow lines in Word:

    Monaco7190Australia1484
    Denver11222Chicago4257
    Beijin42Sofia666877
    London11998Paris75006

    2°) FIND & REPLACE
    Find what : (*[0-9]{1;10})(*)(^13)
    Options : Use wildcards
    Replace with : \2\3
    (REPLACE ALL)

    3°) That’s it!

    I wish Excel have these wildcard capabilities!

  16. Sam says:

    =MID(A1,MODE(MMULT((N(ISNUMBER(-MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A1)))-{0,1}),{1;1}))+1,LEN(A1))

    I can’t get this to work on A1= Monaco7190Australia1484

Leave a Reply