Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 2.)
Welcome back. Get a good night’s sleep? Great.
Hopefully you had freshly-picked brain-function-enhancing blueberries for breakfast, and a red bull or five, because you’re gonna need it. Why? Because today, my friends, we peek inside the multi-dimensional mind of Sajan the Excel Magician.
As you’ll keenly recall from yesterday’s gripping episode of “When good formulas go GREAT”, Sajan claimed Bronze in our inaugural formula challenge at Formula Challenge 001 – Return everything in string after first block of numbers
By way of a quick refresher, this formula challenge calls for a formula to return a substring from a longer string. Those strings look something like this:
String: | Required Substring: |
Monaco7190Australia1484 | Australia1484 |
Liechtenstein42Austria128590 | Austria128590 |
Malta130612Belarus8 | Belarus8 |
You can download the challenge and full dataset here: Formula Challenge 1.2 (Excel 2007+ only, because the formulas have too many nested formulas for earlier Excel versions to handle)
Specifically, we need a formula to return only the second country and associated number …in the case of Monaco7190Australia1484 it’s that bit in bold i.e. Australia1484
Sajan the Magician split those strings with this award-winning beauty:
=MID(A1,MODE(MMULT((N(ISNUMBER(-MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A1)))-{0,1}),{1;1}))+1,LEN(A1))
How does this wizardly wonder work? Let’s find out…
1. Sajan splits apart the source string into it’s characters
This is as I did in the last post, albeit using a slightly different formula than I did:
=MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)
=MID(A1,{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23},1)
={“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″}
2. Then he adds zero to the array
This coerces any numbers stored as text into numbers, while causing the text bits to throw errors
={“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″} +0
={#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;7;1;9;0; #VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;1;4;8;4}
3. Wrap an ISNUMBER around this unsightly spawn of Satan,
This turns numbers to True, and turns those hideously unnatural errors of nature to better-behaved booleans: =ISNUMBER({#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;7;1;9;0;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;1;4;8;4})
={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE}
4. Turn this array into Ones and Zeros, and checks if it equals either One or Zero.
I know…me too. But let’s give him the benefit of the doubt for now, and see what rabbit hole he’s leading us down…we might yet pop out of a magicians’ hat:
=N({FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE})={1,0}
={0;0;0;0;0;0;1;1;1;1;0;0;0;0;0;0;0;0;0;1;1;1;1}={1,0}
={FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE}
What this has done in effect is create a 2D array – with one column of the array being the inverse of the other. So one column records whether something is a number, and the other records whether something isn’t a number. I know…me too. But maybe it would help if we saw what this would look like if entered over an Excel range, with our original string down the side so we can try and work out what spell this Wizard is whipping together:
5. He then creates a second 2d array with this bit:
=(ROW(INDIRECT(“1:”&LEN(A16)))-{0,1})
={1,0;2,1;3,2;4,3;5,4;6,5;7,6;8,7;9,8;10,9;11,10;12,11;13,12;14,13;15,14;16,15;17,16;18,17;19,18;20,19;21,20;22,21;23,22}
…which if we were to enter over an Excel range would look like this:
6. He then throws this array into his wizard’s cauldron together with the array in 5 – along with generous portions of Eye of newt, toe of frog, wool of bat and tongue of dog. (This simply means Multiply the two previous Arrays)
This gives us the following:
={FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;FALSE,TRUE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE;TRUE,FALSE}*{1,0;2,1;3,2;4,3;5,4;6,5;7,6;8,7;9,8;10,9;11,10;12,11;13,12;14,13;15,14;16,15;17,16;18,17;19,18;20,19;21,20;22,21;23,22}
={0,0;0,1;0,2;0,3;0,4;0,5;7,0;8,0;9,0;10,0;0,10;0,11;0,12;0,13;0,14;0,15;0,16;0,17;0,18;20,0;21,0;22,0;23,0}
…which again would look like this if entered over an Excel range (with our original string down the side by way of reference):
Well mess up my hair, and call me Einstein. Because now I see what all that was about. The only time that the same number in that first array also appears in that second array is that magical place where the first block of numbers ends, and the second block of letters begins. That is some serious sorcery!
7. Multiply the arrays
Next he conjures this 2D array into 1D, with the help of MMULT.
Microsoft tells us that MMULT “…returns the matrix product of two arrays. The result is an array with the same number of rows as array1 and the same number of columns as array2.”
If that’s got you scratching your head, then try this alternate explanation: The M in MMULT stands for Magic. 😉
The way Sajan is using MMULT is to basically add across each row to find out what the total is for each row. Or another way of thinking about it is that because there’s ether a zero or a number in each row, he’s zipping up these two lists into one to just get the numbers and to ignore the zeros:
=MMULT({0,0;0,1;0,2;0,3;0,4;0,
= {0;1;2;3;4;5;7;8;9;10;10;11;
8. Work out the most common number in that array and add one to it
Now he’s got the start position of our desired substring.
=MODE({0;1;2;3;4;5;7;8;9;10;10;11;12;13;14;15;16;17;18;20;21;22;23})+1
=11
9. Lastly, he uses this trickery to tease out our answer string
=MID(A16,MODE({0;1;2;3;4;5;7;8;9;10;10;11;12;13;14;15;16;17;18;20;21;22;23})+1,LEN(A16))
=MID(A16,11,LEN(A16))
=Australia1484
Austrailia1484. Crikey dick. Strewth, mate!
That bonza beaut has more bounce than a Boomer! Makes me feel a couple of tinnies short of a slab.
And that’s the formula that got the lowly Bronze! Tune in tomorrow, and we’ll see what the Silver-medal winner, Haseeb, has to offer. If you dare…..
About the Author.
Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:
=NOT(EVEN(PROPER(OR(RIGHT(TODAY())))))
That’s right, pure #VALUE!
Find out more at http:www.heavydutydecisions.co.nz
Hello Awesome...
My name is Chandoo. Thanks for dropping by. My mission is to make you awesome in Excel & your work. I live in Wellington, New Zealand. When I am not F9ing my formulas, I cycle, cook or play lego with my kids. Know more about me.
I hope you enjoyed this article. Visit Excel for Beginner or Advanced Excel pages to learn more or join my online video class to master Excel.
Thank you and see you around.
Related articles:
|
Leave a Reply
« Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 1.) | Are you interested in learning Power Pivot? » |
46 Responses to “Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 2.)”
How about? Am I too late to win 🙁
=MID(A1,MAX(IFERROR(FIND((ROW(1:10)-1)&CHAR(COLUMN(BM:CL)),UPPER(A1)),0))+1,255)
Also note that you don't need the UPPER, although you do need to change ROW(1:10) to absolute references in order to copy the formula down. And you could replace that hard-coded 255 with say 9^9 to handle any length of string (although we only need a number of no more than 32767 on the end, so 9^9 is slightly overkill, albeit with no ill effect).
By my account that makes it 76 characters. Well done.
What if the next character after the first numbers is not between A-Z? Like Monaco7190@Australia1484
Regards
In retrospect, the formula can be further simplified by making the CHAR array more comprehensive (i.e.
CHAR(COLUMN(BF:DS))). Then, there would be no reason to convert A1 to UPPER(A1).
Elias - that's another challenge entirely. In this one - based on some question I answered in a forum somewhere recently - the sample data always has the letter A-Z after a number.
I think this one handles the scenario I’m talking about, and still short.
=MID(A4,MATCH(TRUE,(MID(A4,ROW(A:A),1)*ISERR(-MID(A4,ROW(A:A)+1,1)))>0,)+1,9^9)
Regards
=MID(A1,MAX(IFERROR(FIND(CHAR(COLUMN(BF:DR))&(ROW(1:10)-1),A1),0))+1,255)
David...you've got to have the numbers first. Plus I changed 255 for 9^9 to handle strings longer than 255 characters:
=MID(A1,MAX(IFERROR(FIND((ROW(1:10)-1)&CHAR(COLUMN(BF:DR)),A1),0))+1,9^9)
Hii @David,, I'm unable to understand,,
Column(BF:DR),,
Plz reply to me also @ sinha14raj@gmail.com ?
@Rajesh
CHAR(COLUMN(BF:DR))
is the same as =CHAR(COLUMN(58:122))
So it is just extracting an array of the Characters between Char 58 and 122
ie: between characters : and z
=MID(A1,MAX(IFERROR(FIND((ROW(1:10)-1)&CHAR(COLUMN(BF:DR)),A1),0))+1,255)
Ha...you spoiled my punch line, David. At the end of this series (a couple of posts of which are still to come) I was going to challenge readers to find a shorter solution than the respondents had so far posted, with a slightly longer version of yours in mind. Specifically =MID(A1,MIN(IFERROR(FIND(CHAR(COLUMN($AW:$BF))&CHAR(ROW($65:$90)),A1),""))+1,LEN(A1))
Great skills. Be sure to check out the other challenges, and still check in to see the other innovative solutions that I'll be covering re this challenge.
Whoops, the above comments are out of order. Also, with my slight revision it runs at 79 characters, not 76 as I said below. But that's still a winner!
Man your formula is insane ! All those Matrix operations !!
Here my very simple solution
={MID(B12;MATCH(TRUE;ISNUMBER(--MID(B12;ROW(INDIRECT("1:"&LEN(B12)));1));0)+4;100)}
Regards.
Hi ExcelStrategy. The formula challenge sets out that your formula should handle any length of string, and any length of numbers – not just the lengths shown above/below.
Your formula is hard-coded to 4 digits, but as per the challenge we can't rely on the numbers always being four digits.
Now I understand why you have isolated the only sequence of FALSE;TRUE (0;1) in the initial array! it signals the change between the numbers and text in the two blocks of strings !
I have based my formula on the data wich was inside the example workbook that has a pattern of 4 numbers every time 🙂
Yeah, i was a little lazy when whipping up the sample data. Cheers, ExcelStrategy.
this does not work for me.Can you please advise?
Kevin - you've got to array enter that formula. So instead of just pushing ENTER you've got to push CTRL + SHIFT + ENTER.
Can't you use the CODE function to get ASCII number greater than 64 related to second sequential number greater than 64 via vlookup or nested if or search or find functions?
Simon: here's what CODE returns on the whole string:
={77;111;110;97;99;111;55;49;57;48;65;117;115;116;114;97;108;105;97;49;52;56;52}
...and here's the ones that are greater than 64:
={TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE}
How does that help?
Sorry, I should elaborate further. You can use this:
=CODE(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1))>64
…to return this:
={TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE}
And then we can search that array for the first FALSE with this:
=MATCH(FALSE,CODE(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1))>64,0)
…but then we’d need to find where the next TRUE falls relative to the length of the original string, so that we could split that original string. And that requires quite a bit more manipulation, which really blows out the length of our formula.
The next post discusses a formula that does something similar to your suggested approach. But there’s not a MATCH or VLOOKUP in sight. Stay tuned ?
Weird...I posted a smiley face at the end of Stay Tuned, but WordPress turned it into a question mark.
Pair of parentheses not needed, final formula.
=MID(A1,MAX(IFERROR(FIND(ROW(1:10)-1&CHAR(COLUMN(BF:DR)),A1),0))+1,9^9)
David,
Very clever technique! I love its simplicity! Sheer genius!
By the way, I recently discovered your old EEE posts. Amazing stuff! I hope you resurrect those posts since there are very few posts out there of such quality!
Regards,
Sajan.
Sajan...don't tease me...what's the link.
Hi Jeff,
You can find the old EEE posts at:
http://www.j-walk.com/ss/excel/eee/
Regards,
Sajan.
Still need those $ in the Row, though.
=MID(A1,MAX(IFERROR(FIND(ROW($1:$10)-1&CHAR(COLUMN(BF:DR)),A1),0))+1,9^9)
Otherwise you have to array-enter your formula individually for each item in our data list.
David, you can shave another character off like this:
=MID(A1,MAX(IFERROR(FIND(COLUMN(A:J)-1&CHAR(ROW($65:$90)),A1),0))+1,9^9)
THis works, because (A:J) is shorter than (1:10)
Way too complicated for this little brain but love how your mind works, Chandoo - the wit!! 🙂
@David - Brilliant...Sheer Genius....
@Jeff Reminds me of the gem to search a exact word in a string from the academy from you and Craig.
@Sam...yeah, I remember you were standing and clapping A phase that I've now trademarked, so use it and I'll sue your ass 😉
Am new to the Site. Spending hours together to understand the work - breaking my head.
Could somebody help me to understand why we use "-" in formula (before 2nd MID formula and "-{0,1}) and what {1;1} does.
MID(A6,MODE(MMULT((N(ISNUMBER(-MID(A6,ROW(INDIRECT("1:"&LEN(A6))),1)))={1,0})*(ROW(INDIRECT("1:"&LEN(A6)))-{0,1}),{1;1}))+1,LEN(A6))
Thanks!
Valli
Hi All,
Just came across this.........Thought I'd have a crack at it before reading anyone else's responses, just to see what method my brain would come up with!
This entered as an array formula (Ctrl + Shift +Enter) would do the trick, changing A1 for whatever cell needed
=RIGHT(A1,LEN(A1)-MIN(IFERROR(SEARCH({"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"},A1,MIN(IFERROR((SEARCH({1,2,3,4,5,6,7,8,9},A1,1)),"")))-1,"")))
@Jeff
Woah psychotic is most probably an understatement. Let me say this english is my first language and I have read and understood your brilliant explanation. I have even taken the time to test bits of your explanation of see what some of the functions are doing.
What still befuddles me is how a number of you even start to approach this type of problem right from the beginning. Which functions to use and in what order to get to the solutions you are all showing.
I used to just write very simple formulas that did one or two things and with this site I have come on leaps and bounds by studying the answers and explanations from lots of contributors. A few names always rise to the top (that includes your forum posts Jeff and those of the Forum Ninjas) I just want to get to the point where I can begin to approach a problem like this by understanding which of those functions I should use and how to nest them. Rather than just copying an answer and moving on. I know that will take time, practice and lots more learning and I am sure it will slowly come. Keep up the great explanations and remember us less gifted lovers of excel.
Bring on episode three Jeff....................... I can take it 😉
Thanks for your kind words. You raise a good point...how to even start formulating approaches such as this. I'll probably write something up in due course along those lines. I can certainly say that there's no way I'd ever have put together the approaches that some of the other respondents put together. That's the beauty of a challenge like this...you get to see many expert cat-skinners in action.
Jeff a great idea
".....you raise a good point…how to even start formulating approaches such as this. I’ll probably write something up in due course along those lines........."
Looking forward to that. Book me a front row seat.
Maybe a series of posts from your good self and some of the Ninjas to start with????????????????
[...] Sajan: Formula Challenge 001 – Part 2 [...]
I already posted this but look like it was unseen. This one can handle any character after the numbers and not only letters A-Z.
=MID(A4,MATCH(TRUE,(MID(A4,ROW(A:A),1)*ISERR(-MID(A4,ROW(A:A)+1,1)))>0,)+1,9^9)
Regards
I can reduce it to 76 characters length. It also does not get mess if users insert/delete a row/column.
=MID(A4,MATCH(TRUE,MID(A4,ROW(A:A),1)*ISERR(-MID(A4,ROW(A:A)+1,1))>0,)+1,9^9)
Regards
Hi,
My 2 cents (76 chars), Excel 2003:
=MID(A2;MATCH(2;MMULT(N(ISERR(-MID(A2;ROW($1:$99)+{0\1};1)));{1;2});0)+1;99)
(French conventions:
formula separator ";"
{0\1}=column vector 0,1
{1;2}=row vector 1,2)
Great stuff. Sajan also came up with this one, which I've written up at http://chandoo.org/wp/2013/07/22/formula-challenge-001-return-everything-from-a-string-after-the-first-block-of-numbers-part-4/
His version has some added stuff to ensure any length of string is covered.
=MID(A1,MATCH(1,MMULT(-ISERR(-MID(A1,ROW(OFFSET(A$1,,,LEN(A1))),{1,2})),{1;-1}),)+1,6^6)
It is strange to see how complex data manipulation is with Excel when it is so simple with Word:
1°) Copy the herebelow lines in Word:
Monaco7190Australia1484
Denver11222Chicago4257
Beijin42Sofia666877
London11998Paris75006
2°) FIND & REPLACE
Find what : (*[0-9]{1;10})(*)(^13)
Options : Use wildcards
Replace with : \2\3
(REPLACE ALL)
3°) That's it!
I wish Excel have these wildcard capabilities!
I agree. What you can do with Word's Find and Replace is very cool.
=MID(A1,MODE(MMULT((N(ISNUMBER(-MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A1)))-{0,1}),{1;1}))+1,LEN(A1))
I can't get this to work on A1= Monaco7190Australia1484
@Sam
It works fine for me?
You have to retype the " characters, as the characters above look like " but aren't
It doesn't need Array Entering, just a simple Enter will suffice
Have a look at my file here: https://www.dropbox.com/s/rtpbpvlyvynvlwl/For%20Sam.xlsx
If that doesn't help what version of excel are you using ?