Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 4.)
Welcome back to another gripping episode of “When good formulas go GREAT”. Just like the immortal combatants in the classic 1986 movie Highlander, over the last three posts in this series our Excel nerds have been locked in an ages-old battle to decapitate. A text-string, that is. Not each other.
So far we have seen some formidable formulas from these fearsome foes:
Jeff: Formula Challenge 001 – Part 1
Sajan: Formula Challenge 001 – Part 2
Haseeb: Formula Challenge 001 – Part 3
But while we’ve witnessed a few intriguing battles, none of our defeated warriors ultimately had what it took to win the prize. So steel your nerves, grab a seat near the edge of the arena, and find out what brute force another contender can bring to bear on the problem in…
…Part 4: The Sorcerer Samurai
By way of a quick refresher, our formula challenge calls for a mighty Excel hero to slay a fearsome dragon – err, text string – and return triumphantly back to Court with its tail – err, substring – in time for the cook (that would be me) to serve it up to the King (that would be Chandoo) at the Feast of St. Christopher.
Those dragons and associated tails look a little something like this:
Dragon: | Tail: |
NewZealand99Australia41 | Australia41 |
France12NewZealand41 | NewZealand41 |
Australia23France17 | France17 |
England53France37 | France37 |
England7NewZealand27 | NewZealand27 |
Australia16England24 | England24 |
SouthAfrica21France11 | France11 |
SouthAfrica42Australia33 | Australia33 |
NewZealand48SouthAfrica34 | SouthAfrica34 |
England12SouthAfrica22 | SouthAfrica22 |
(Bonus Question: Who can tell me what these numbers represent?)
You can download the challenge and full dataset here: Formula Challenge 1.4
So our brave Knights must wield their mighty sword – Excelibur – and lop off the bit in bold at the end. The winner of the challenge is the combatant who can do that slicing with as short a sword – err, formula – as possible.
That winner receives his or her weight in gold. Fool’s gold. And as a wise Highlander one said about the winner of battles involving mass decapitations: “There can be only one”.
But this will be harder than it looks. Those numbers are of variable length (and getting longer each year), and the fact that we are searching for a block of text and numbers that occurs after another block of text and numbers makes it very tricky indeed. Mwah ha ha.
Well, the next contender for the prize is….SAJAN! Again!!
You might remember Sajan the Magician’s vaguely coherent incantation of a formula from a few posts ago:
=MID(A1,MODE(MMULT((N(ISNUMBER(-MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A1)))-{0,1}),{1;1}))+1,LEN(A1))
Discontent with the mere Bronze that his wizardry brought him, Sajan’s kicked over his magic cauldron, drawn his trusty pen-knife of a formula from its dusty scabbard, and stabbed our string right through the heart. Dead. With this:
=MID(A1,MATCH(1,MMULT(-ISERR(-MID(A1,ROW(OFFSET(A$1,,,LEN(A1))),{1,2})),{1;-1}),)+1,6^6)
Wow! From 133 characters down to 88! I can still hear the chants of “Long Live Sajan the Samurai” ringing in my ears from when he dragged that severed substring into the Chandoo.org forum and dumped it victoriously at the formula challenge thread’s feet . Err, footer.
But you weren’t there to see it, were you? So let’s have our Court Jester (me) don his silly face, and re-enact the fight blow by blow, so to speak. En Garde!
1. Samurai Sajan sneaks up on the string and slices it into slender slivers:
He carves the string into one-character off-cuts, and casts them into a couple of columns with his crescent-shaped cutlass:
=MID(A1,ROW(OFFSET(A$1,,,LEN(A1))),{1,2})
{ “N”, “Ne”; “e”, “ew”; “w”, “wZ”; “Z”, “Ze”; “e”, “ea”; “a”, “al”; “l”, “la”; “a”, “an”; “n”, “nd”; “d”, “d9”; “9”, “99”; “9”, “9A”; “A”, “Au”; “u”, “us”; “s”, “st”; “t”, “tr”; “r”, “ra”; “a”, “al”; “l”, “li”; “i”, “ia”; “a”, “a4”; “4”, “41”; “1”, “1”}
So he’s taken the original 1D string,and split it apart into a 2D (i.e. two column) array with the help of the {1,2} bit. Both columns of the new array are almost the same, except all the elements in one of them is one character long, and all the elements in the other are two characters long.
If entered over a range, here’s what that looks like:
3. He displays any of dem digits dat looks like a digit as a digit, using a dash:
Currently Excel thinks that this array is just text – even the numbers in it. Putting a minus sign in the front of the array – or in fact doing any kind of mathematical operation on it – will force any numbers to wake up and act like numbers. (Unfortunately, it also freaks any actual text out…to the point that any bits of text whinges “I’m not a number, there’s been some kind of error”. Hence the #VALUE! errors below, along with the numbers.)
= -{ “N”, “Ne”; “e”, “ew”; “w”, “wZ”; “Z”, “Ze”; “e”, “ea”; “a”, “al”; “l”, “la”; “a”, “an”; “n”, “nd”; “d”, “d9”; “9”, “99”; “9”, “9A”; “A”, “Au”; “u”, “us”; “s”, “st”; “t”, “tr”; “r”, “ra”; “a”, “al”; “l”, “li”; “i”, “ia”; “a”, “a4”; “4”, “41”; “1”, “1”}
={ #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -9, -99; -9, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -4, -41; -1, -1}
Again, here’s how that looks if entered over an Excel range (with our original string split apart down the side by way of reference):
4. He hacked away at the resulting horrible herrors (err..errors) that happened due to this harsh handling:
He turns any of those errors to TRUE and anything else to FALSE with this:
=ISERR({ #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -9, -99; -9, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -4, -41; -1, -1}
={ TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, FALSE}
So now we have TRUE where there was text, and FALSE where there was numbers. Which looks like this:
5. He butchered those Boolean values into bite-sized bits, by binding a brutal minus sign on at the beginning:
Just as putting a minus sign in the front in formula 3 above forced any numbers stored as text to act like numbers, putting a minus sign in front of an array of TRUE and FALSE values (otherwise known as Boolean values) or doing any other kind of mathematical operation on the array will turn those TRUE and FALSE values to numerical values – in this case because it is a minus sign they will turn to -1 and zero. Abracadabra.
=-{ TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, FALSE}
={-1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; 0, 0; 0, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1 ,-1; -1, -1; -1, -1; -1, -1; 0, 0; 0, 0}
Note something interesting…the highlighted row above falls just before the string that we’re after. And it’s the only row in the entire array that has a zero in the first column and a negative one in the second column. So if we can work out where that combination of zero and negative one falls, we know exactly where our desired substring starts.
6. He magically multiplies the first array column by one, and the second array column by minus one, then maniacally mashes them together en masse:
He does this by using the MMULT function with a 2nd argument of {1;-1}.
=MMULT(={-1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; 0, 0; 0, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1 ,-1; -1, -1; -1, -1; -1, -1; 0, 0; 0, 0},{1;-1})
={0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0}
MMULT is really hard to explain. It’s kind of like an 2-dimensional SUMPRODUCT. (Maybe it’s exactly like a 2-dimensional SUMPRODUCT). There’s a great visual explanation at Mike Girvin’s ExcelIsFun YouTube channel at http://www.youtube.com/watch?v=qJnL5hFfcYo
Given our array in this instance has two columns and our second argument of MMULT is {1;-1}, MMULT in this case effectively multiplies each number in the first column by positive one, multiplies each number in that 2nd column by negative one – and then adds the result together. Multiplying the first array by positive one does nothing to it. Multiplying the second array by negative one changes any existing negative values to positive. Adding the result together gets us back to a 1D array.
Now, as per the highlighted row above, the number that falls just before the string that we’re after is the only line that has 1 in it. So while Sajan might have ditched sorcery for sword-play, there’s still yet an element of the dark arts in his repertoire.
7. He secures his sharp scalpel for some exploratory surgery on our string:
Specifically, he uses the MATCH function to see what position that ‘1’ occurs at, and then adds 1.
=MATCH(1,{0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0},)+1
=13
Quick aside: I never knew until another competitor in the challenge – Elias – posted a formula in the actual formula challenge thread that if the third argument of MATCH is a comma followed by nothing else, Excel interprets this the same as if that third argument was FALSE or Zero – that is, Excel is looking for an exact match, not an approximate one. So that’s a handy way to shorten a formula if you’re answering a challenge (although I’d actively put the FALSE in in a real-world situation).
So with a blank comma as the 3rd argument, all these are equivalent:
=MATCH(3,{1,2,4,5},)
=MATCH(3,{1,2,4,5},FALSE)
=MATCH(3,{1,2,4,5},0)
= #N/A
And without the comma, these are equivalent:
=MATCH(3,{1,2,4,5})
= MATCH(3,{1,2,4,5},TRUE)
= MATCH(3,{1,2,4,5},1)
= 2
Okay, back to the fight!
8. He culminates all this cutting with a callous coup de grâce to the condemned creature:
=MID(A1,13,6^6)
=Australia41
That 6^6 bit is just a short way to write a long number. We need a number in that argument that’s long enough so that all possible substring lengths are covered. Sajan could use LEN(A1), but that would take 7 characters, whereas 6^6 is only three characters. The maximum amount of characters that Excel 2007 or later will let you put in a string is 32,767 so given that 6^6 = 46,656 this will be more than enough.
Huzzah! How’s that for swordsmanship, eh? Hardly a fair fight…our poor string never really stood a chance against our mighty Samurai.
Well, that wraps up the joust for today. But stay tuned…there’s two super-short excellent approaches that have been proposed since this series of posts began, that definitely have to be seen to be believed. I’ll cover them off in a future post.
About the Author.
Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:
=NOT(EVEN(PROPER(OR(RIGHT(TODAY())))))
That’s right, pure #VALUE!
Find out more at http:www.heavydutydecisions.co.nz
Hello Awesome...
My name is Chandoo. Thanks for dropping by. My mission is to make you awesome in Excel & your work. I live in Wellington, New Zealand. When I am not F9ing my formulas, I cycle, cook or play lego with my kids. Know more about me.
I hope you enjoyed this article. Visit Excel for Beginner or Advanced Excel pages to learn more or join my online video class to master Excel.
Thank you and see you around.
Related articles:
|
Leave a Reply
« Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 3.) | Details about upcoming Power Pivot course (and a bonus tip on dashboards) » |
20 Responses to “Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 4.)”
Assuming that Monaco7190Australia1484 is in A2, then in B2 we can write:
=MID(A2,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A2,MIN(IFERROR(FIND(CHAR(ROW($48:$57)),A2),””))+1),””)),LEN(A2))
Of course it’s an array formula, so CTRL + SHIFT + ENTER is required…
CHAR(48) to CHAR(57) returns the numbers 0 to 9.
Pure genius!
Well done Sajan, that is an amazing formula. Very impressed. 🙂
I can't compete on the formula side of things, but are the strings at the top the number of times each side has won in matches between the 2 in rugby (union)?
You got it, Ian. And guess what country I'm in!
Yay, one thing I can improve on is instead of 6^6 used 8^5
🙂
Bonus Question: Who can tell me what these numbers represent
Rugby Union wins between competing countries
EG New Zealand V Australia
of the 140 matches New Zealand have won 99
Truly awesome! There is beauty in the elegance of these solutions.
Great posts on this challenge!
Was space-delimited data a requirement? Adding a space to any string results in each of these solutions failing.
No, it wasn't a requirement.
So how about a solution that handles strings with spaces? 🙂
Although I am having hard enough time figuring out what the heck is going on with the existing solutions ...
Just use VBA. Make life easier.
" I’ll cover them off in a future post."
How far in the future? 🙂
I'm having problems getting the DeLorian up to 88 mph, David 😉
Just a question of time.
Forgot to say that that was a neat post at DDOE, David. Be sure to check out the challenge at http://chandoo.org/forums/topic/formula-challenge-016-unique-items-in-a-delimited-string that I posted as a result...already some innovative approaches there.
I have another approach "{=RIGHT(B3,LEN(B3)-MATCH("TRUETRUE",(IF(CODE(MID(B3,ROW(INDIRECT("1:"&LEN(B3))),1))64,TRUE,FALSE))),0))} "
Chetan, can you try reposting that.
@Chetan
WordPress doesn't like the use of > GT or < LT characters as they are assumed to be used for marking special code blocks
Try using GT, LT, GTE, LTE etc and re-post the formula
Came back to this site after a long time.... can this be a shorter answer? just modified Sajan's.... {=MID(A1,MATCH(1,--(ISNUMBER(-MID(A1,ROW(OFFSET(A1,,,LEN(A1))),2))),)+2,6^6)}
Hello all of you and thanks for this awesommmmme formulas,
Just for the sake of beauty, here is a UDF with Regular Expressions (7 lines of codes) :
Function test (rng As Range) As string
Const strPattern As String = "[A-Z]+\d+"
Dim Obj As Object
Dim temp As Variant
Dim result As String
Set Obj = CreateObject("vbscript.regexp")
With Obj
.Pattern = strPattern
.Global = True
.ignorecase = True
Set temp = .Execute(rng.Value)
End
result = temp.Item(1).Value
essai =
End Function
Thanks for opening this challenge and the awesome formula.
Can someone explain or give me some hints on the ROW(OFFSET(A$1,,,LEN(A19))) ?
I have no idea how it instructs the string extraction in the form of consecutive 1~2 characters.
Thanks very much, this challenge really opened my eyes.