Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 4.)

Posted on July 22nd, 2013 in Formula Challenges , Posts by Jeff - 20 comments

FC001-4HeadWelcome back to another gripping episode of “When good formulas go GREAT”.  Just like the immortal combatants in the classic 1986 movie Highlander, over the last three posts in this series our Excel nerds have been locked in an ages-old battle to decapitate. A text-string, that is. Not each other.

So far we have seen some formidable formulas from these fearsome foes:

Jeff: Formula Challenge 001 – Part 1

Sajan: Formula Challenge 001 – Part 2

Haseeb: Formula Challenge 001 – Part 3

But while we’ve witnessed a few intriguing battles, none of our defeated warriors ultimately had what it took to win the prize. So steel your nerves, grab a seat near the edge of the arena, and find out what brute force another contender can bring to bear on the problem in…

…Part 4: The Sorcerer Samurai

By way of a quick refresher, our formula challenge calls for a mighty Excel hero to slay a fearsome dragon – err, text string – and return triumphantly back to Court with its tail – err, substring – in time for the cook (that would be me) to serve it up to the King (that would be Chandoo) at the Feast of St. Christopher.

Those dragons and associated tails look a little something like this:

Dragon: Tail:
NewZealand99Australia41 Australia41
France12NewZealand41 NewZealand41
Australia23France17 France17
England53France37 France37
England7NewZealand27 NewZealand27
Australia16England24 England24
SouthAfrica21France11 France11
SouthAfrica42Australia33 Australia33
NewZealand48SouthAfrica34 SouthAfrica34
England12SouthAfrica22 SouthAfrica22

(Bonus Question: Who can tell me what these numbers represent?)

You can download the challenge and full dataset here: Formula Challenge 1.4

So our brave Knights must wield their mighty sword – Excelibur – and lop off the bit in bold at the end. The winner of the challenge is the combatant who can do that slicing with as short a sword – err, formula – as possible.

That winner receives his or her weight in gold. Fool’s gold.  And as a wise Highlander one said about the winner of battles involving mass decapitations: “There can be only one”.

But this will be harder than it looks. Those numbers are of variable length (and getting longer each year), and the fact that we are searching for a block of text and numbers that occurs after another block of text and numbers makes it very tricky indeed. Mwah ha ha.

Well, the next contender for the prize is….SAJAN! Again!!

You might remember Sajan the Magician’s vaguely coherent incantation of a formula from a few posts ago:

=MID(A1,MODE(MMULT((N(ISNUMBER(-MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A1)))-{0,1}),{1;1}))+1,LEN(A1))

Discontent with the mere Bronze that his wizardry brought him, Sajan’s kicked over his magic cauldron, drawn his trusty pen-knife of a formula from its dusty scabbard, and stabbed our string right through the heart. Dead. With this:

=MID(A1,MATCH(1,MMULT(-ISERR(-MID(A1,ROW(OFFSET(A$1,,,LEN(A1))),{1,2})),{1;-1}),)+1,6^6)

JeffWow! From 133 characters down to 88! I can still hear the chants of “Long Live Sajan the Samurai” ringing in my ears from when he dragged that severed substring into the Chandoo.org forum and dumped it victoriously at the formula challenge thread’s  feet . Err, footer.

But you weren’t there to see it, were you? So let’s have our Court Jester (me) don his silly face, and re-enact the fight blow by blow, so to speak. En Garde!

 

 

1. Samurai Sajan sneaks up on the string and slices it into slender slivers:

He carves the string into one-character off-cuts, and casts them into a couple of columns with his crescent-shaped cutlass:

=MID(A1,ROW(OFFSET(A$1,,,LEN(A1))),{1,2})

{ “N”, “Ne”; “e”, “ew”; “w”, “wZ”; “Z”, “Ze”; “e”, “ea”; “a”, “al”; “l”, “la”; “a”, “an”; “n”, “nd”; “d”, “d9”; “9”, “99”; “9”, “9A”; “A”, “Au”; “u”, “us”; “s”, “st”; “t”, “tr”; “r”, “ra”; “a”, “al”; “l”, “li”; “i”, “ia”; “a”, “a4”; “4”, “41”; “1”, “1”}

So he’s taken the original 1D string,and split it apart into a 2D (i.e. two column) array with the help of the {1,2} bit. Both columns of the new array are almost the same, except all the elements in one of them is one character long, and all the elements in the other are two characters long.

If entered over a range, here’s what that looks like:
Part 4 Array 1

3. He displays any of dem digits dat looks like a digit as a digit, using a dash:

Currently Excel thinks that this array is just text – even the numbers in it. Putting a minus sign in the front of the array – or in fact doing any kind of mathematical operation on it – will force any numbers to wake up and act like numbers. (Unfortunately, it also freaks any actual text out…to the point that any bits of text whinges “I’m not a number, there’s been some kind of error”. Hence the #VALUE! errors below, along with the numbers.)
= -{ “N”, “Ne”; “e”, “ew”; “w”, “wZ”; “Z”, “Ze”; “e”, “ea”; “a”, “al”; “l”, “la”; “a”, “an”; “n”, “nd”; “d”, “d9”; “9”, “99”; “9”, “9A”; “A”, “Au”; “u”, “us”; “s”, “st”; “t”, “tr”; “r”, “ra”; “a”, “al”; “l”, “li”; “i”, “ia”; “a”, “a4”; “4”, “41”; “1”, “1”}

={ #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -9, -99; -9, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -4, -41; -1, -1}

Again, here’s how that looks if entered over an Excel range (with our original string split apart down the side by way of reference):
Part 4 Array 2

 

4. He hacked away at the resulting horrible herrors (err..errors) that happened due to this harsh handling:

He turns any of those errors to TRUE and anything else to FALSE with this:
=ISERR({ #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -9, -99; -9, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -4, -41; -1, -1}

={ TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, FALSE}

So now we have TRUE where there was text, and FALSE where there was numbers. Which looks like this:
Part 4 Array 3

 

5. He butchered those Boolean values into bite-sized bits, by binding a brutal minus sign on at the beginning:

Just as putting a minus sign in the front in formula 3 above forced any numbers stored as text to act like numbers, putting a minus sign in front of an array of TRUE and FALSE values (otherwise known as Boolean values) or doing any other kind of mathematical operation on the array will turn those TRUE and FALSE values to numerical values – in this case because it is a minus sign they will turn to -1 and zero. Abracadabra.
=-{ TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, FALSE}

={-1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; 0, 0; 0, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1 ,-1; -1, -1; -1, -1; -1, -1; 0, 0; 0, 0}

…which looks like this:
Part 4 Array 4

Note something interesting…the highlighted row above falls just before the string that we’re after. And it’s the only row in the entire array that has a zero in the first column and a negative one in the second column. So if we can work out where that combination of zero and negative one falls, we know exactly where our desired substring starts.

6. He magically multiplies the first array column by one, and the second array column by minus one, then maniacally mashes them together en masse:

He does this by using the MMULT function with a 2nd argument of {1;-1}.
=MMULT(={-1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; 0, 0; 0, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1 ,-1; -1, -1; -1, -1; -1, -1; 0, 0; 0, 0},{1;-1})

={0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0}

…which looks like this:
Part 4 Array 5

 

MMULT is really hard to explain. It’s kind of like an 2-dimensional SUMPRODUCT. (Maybe it’s exactly like a 2-dimensional SUMPRODUCT).  There’s a great visual explanation at Mike Girvin’s ExcelIsFun YouTube channel at http://www.youtube.com/watch?v=qJnL5hFfcYo

Given our array in this instance has two columns and our second argument of MMULT is {1;-1}, MMULT in this case effectively multiplies each number in the first column by positive one, multiplies each number in that 2nd column by  negative one –  and then adds the result together.  Multiplying the first array by positive one does nothing to it. Multiplying the second array by negative one changes any existing negative values to positive. Adding the result together gets us back to a 1D array.

Now, as per the highlighted row above, the number that falls just before the string that we’re after is the only line that has 1 in it. So while Sajan might have ditched sorcery for sword-play, there’s still yet an element of the dark arts in his repertoire.

 

7. He secures his sharp scalpel for some exploratory surgery on our string:

Specifically, he uses the MATCH function to see what position that ‘1’ occurs at, and then adds 1.
=MATCH(1,{0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0},)+1

=13

Quick aside: I never knew until another competitor in the challenge – Elias – posted a formula in the actual formula challenge thread that if the third argument of MATCH is a comma followed by nothing else, Excel interprets this the same as if that third argument was FALSE or Zero – that is, Excel is looking for an exact match, not an approximate one. So that’s a handy way to shorten a formula if you’re answering a challenge (although I’d actively put the FALSE in in a real-world situation).

So with a blank comma as the 3rd argument, all these are equivalent:
=MATCH(3,{1,2,4,5},)
=MATCH(3,{1,2,4,5},FALSE)
=MATCH(3,{1,2,4,5},0)
= #N/A

And without the comma, these are equivalent:
=MATCH(3,{1,2,4,5})
= MATCH(3,{1,2,4,5},TRUE)
= MATCH(3,{1,2,4,5},1)
= 2

Okay, back to the fight!

8. He culminates all this cutting with a callous coup de grâce to the condemned creature:

=MID(A1,13,6^6)

=Australia41

That 6^6 bit is just a short way to write a long number. We need a number in that argument that’s long enough so that all possible substring lengths are covered. Sajan could use LEN(A1), but that would take 7 characters, whereas 6^6 is only three characters. The maximum amount of characters that Excel 2007 or later will let you put in a string is 32,767 so given that 6^6 = 46,656 this will be more than enough.

Huzzah! How’s that for swordsmanship, eh? Hardly a fair fight…our poor string never really stood a chance against our mighty Samurai.

Well, that wraps up the joust for today. But stay tuned…there’s two super-short excellent approaches that have been proposed since this series of posts began, that definitely have to be seen to be believed. I’ll cover them off in a future post.

About the Author.

Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:

=NOT(EVEN(PROPER(OR(RIGHT(TODAY())))))

That’s right, pure #VALUE!

Find out more at http:www.heavydutydecisions.co.nz

 

Written by Jeff Weir
Tags: , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

20 Responses to “Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 4.)”

  1. Assuming that Monaco7190Australia1484 is in A2, then in B2 we can write:

    =MID(A2,MIN(IFERROR(SEARCH(CHAR(ROW($65:$90)),A2,MIN(IFERROR(FIND(CHAR(ROW($48:$57)),A2),””))+1),””)),LEN(A2))

    Of course it’s an array formula, so CTRL + SHIFT + ENTER is required…

    CHAR(48) to CHAR(57) returns the numbers 0 to 9.

  2. ethoros says:

    Pure genius!
    Well done Sajan, that is an amazing formula. Very impressed. 🙂

  3. Ian says:

    I can't compete on the formula side of things, but are the strings at the top the number of times each side has won in matches between the 2 in rugby (union)?

  4. ethoros says:

    Yay, one thing I can improve on is instead of 6^6 used 8^5
    🙂

  5. ianamck says:

    Bonus Question: Who can tell me what these numbers represent

    Rugby Union wins between competing countries

    EG New Zealand V Australia

    of the 140 matches New Zealand have won 99

  6. Dave S says:

    Truly awesome! There is beauty in the elegance of these solutions.

  7. Jared says:

    Great posts on this challenge!

    Was space-delimited data a requirement? Adding a space to any string results in each of these solutions failing.

  8. Sulabh says:

    Just use VBA. Make life easier.

  9. David Hager says:

    " I’ll cover them off in a future post."
    How far in the future? 🙂

  10. Chetan says:

    I have another approach "{=RIGHT(B3,LEN(B3)-MATCH("TRUETRUE",(IF(CODE(MID(B3,ROW(INDIRECT("1:"&LEN(B3))),1))64,TRUE,FALSE))),0))} "

  11. Udit says:

    Came back to this site after a long time.... can this be a shorter answer? just modified Sajan's.... {=MID(A1,MATCH(1,--(ISNUMBER(-MID(A1,ROW(OFFSET(A1,,,LEN(A1))),2))),)+2,6^6)}

  12. Stephane says:

    Hello all of you and thanks for this awesommmmme formulas,

    Just for the sake of beauty, here is a UDF with Regular Expressions (7 lines of codes) :

    Function test (rng As Range) As string

    Const strPattern As String = "[A-Z]+\d+"
    Dim Obj As Object
    Dim temp As Variant
    Dim result As String

    Set Obj = CreateObject("vbscript.regexp")

    With Obj
    .Pattern = strPattern
    .Global = True
    .ignorecase = True
    Set temp = .Execute(rng.Value)
    End
    result = temp.Item(1).Value
    essai =

    End Function

  13. Ray says:

    Thanks for opening this challenge and the awesome formula.

    Can someone explain or give me some hints on the ROW(OFFSET(A$1,,,LEN(A19))) ?

    I have no idea how it instructs the string extraction in the form of consecutive 1~2 characters.

    Thanks very much, this challenge really opened my eyes.

Leave a Reply