• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Need help to understand formula

Linda Crasco

New Member
Hello,

I was searching with Google for a way to extract the numbers from a text string (different numbers of characters) and found a formula that works perfectly. As glad as I am that it solved my problem, I can't figure out how it works. Could someone explain step by step what the formula is doing? I would really appreciate it.

in this example, my text string is in cell A1

=LOOKUP(99^99,--("0"&MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW($1:$10000))))

~Linda
 
Hi Linda ,

The formula can be split up into its constituents , as shown below :

=LOOKUP(99^99,--("0" & MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW($1:$10000))))

First , let us take the inner-most portion , highlighted in blue.

This is a standard technique to look for a digit in a string of characters consisting of both letters of the alphabet and numbers.

Put the following formula in any unused cell :

=SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")

Now try various possibilities in A1 ; what are they ?

1. A1 can be blank
2. A1 can contain 1 character which is not a digit
3. A1 can contain 1 digit
4. A1 can contain several characters , none of which is a digit
5. A1 can contain a random number , say 7304162958
6. A1 can contain a random text string consisting of a mix of alpha characters and digits.

You can see the results as follows :

1. The output will be 1

Why is this so ?

Because A1 is blank , when we concatenate A1 with the string "0123456789" , we get the same string , and since we are searching for the digits , starting with the digit 0 , we will find it in the first place.

Suppose we had changed the formula to :

=SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"9876543210")

In this case , since A1 is blank , we would be looking for the digits starting with 0 , but in this case , since 0 is at the end of the string , we would get 10.

Suppose we had changed the formula to :

=SEARCH({9,8,7,6,5,4,3,2,1,0},A1&"9876543210")

we would again get 1 since now we would be starting the search by looking for 9 first.

In a like manner , if we changed the formula to :

=SEARCH({9,8,7,6,5,4,3,2,1,0},A1&"0123456789")

we would get 10.

NOTE : Though the cell will show a single value each time , what the SEARCH function returns is actually an array of values , signifying the position where each digit is to be found in the string ; thus , in each of the above 4 cases , the actual value returned by the SEARCH function would be the following arrays :

{1,2,3,4,5,6,7,8,9,10}

{10,9,8,7,6,5,4,3,2,1}

{1,2,3,4,5,6,7,8,9,10}

{10,9,8,7,6,5,4,3,2,1}

---------------------------------------------------------------------------------------

2. When A1 contains a single character which is not a digit , the output of the SEARCH function will be :

2

and the array will be :

{2,3,4,5,6,7,8,9,10,11}

This is because the concatenation is from the second character onwards , and so when we look for the digits starting with 0 , we will find it in the second position.

---------------------------------------------------------------------------------------

3. When A1 contains a single character which is a digit , the output of the SEARCH function will be dependent on what the digit in A1 is :

0 - {1,3,4,5,6,7,8,9,10,11}
1 - {2,1,4,5,6,7,8,9,10,11}
2 - {2,3,1,5,6,7,8,9,10,11}
3 -
4 -
5 -
6 -
7 - {2,3,4,5,6,7,8,1,10,11}
8 - {2,3,4,5,6,7,8,9,1,11}
9 - {2,3,4,5,6,7,8,9,10,1}

You can see that if the digit entered is a 0 , then it is found in the first place , while if the entered digit is any other digit , it will be found it its appropriate location in the string , shifted by one place since the first digit is now 0.

---------------------------------------------------------------------------------------

4. When A1 contains several characters , none of which is a digit , you can see that the output will depend on the length of the entered string , since the digits will be offset from the first position by the length of the entered string ; if the entered string is Excel , the output will be :

{6,7,8,9,10,11,12,13,14,15}

---------------------------------------------------------------------------------------

5. When A1 contains a random number , say 7304162958 , the output will be :

{3,5,7,2,4,9,6,1,10,8}

which is to be read as follows :

0 is to be found in position 3
1 is to be found in position 5
2 is to be found in position 7

and so on.

---------------------------------------------------------------------------------------

6. When A1 consists of a random mix of alpha and numeric characters , as in the string Row1048576 , the output will be :

{5,4,13,14,6,8,10,9,7,20}

which is to be read as follows :

0 is to be found in position 5
1 is to be found in position 4
2 is to be found in position 13 , since it is not a part of the entered string , but is available in the concatenated portion.

You can easily decipher the remaining values.

Any doubts ?

Narayan

To be continued ...
 
Thank you Narayan,

You provided a very helpful explanation; I really appreciated the examples!

I'm now ready for part II of the explanation.

On a personal note, I am always amazed at how willing people are to share their knowledge. I have learned so much from this site, and have asked a few questions and always received informative replys. To all the genuises who take the time to throughly explain concepts to those of us less educated in Excel, I sincerly thank you.

~Linda
 
Hi Linda ,

Now that we understand how the SEARCH function works , the question is why do we concatenate the string of digits to the value in the input cell i.e. why do we use the construct :

A1&"0123456789"

Since the SEARCH function returns an error value if it does not find the search value ( which are the digits in our case ) , we will have to take care of the error values at the beginning stage , since if we allow them to remain as error values , they will affect the outputs of the other functions that we are going to use , such as the MIN and MID functions.

For example , suppose A1 contains the number 121 , and we use the formula :

=SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")

The output that we will see will be :

{4,1,2,7,8,9,10,11,12,13}

The output if our formula were :

=SEARCH({0,1,2,3,4,5,6,7,8,9},A1)

would be :

{#VALUE!,1,2,#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!}

The error values are because the digits 0 , 3 , 4 , 5 , 6 , 7 , 8 and 9 do not figure in the number.

We cannot now use the MIN function on this output directly , since the MIN function does not ignore error values.

Thus , the concatenation is a technique that should be used when ever you use the SEARCH function in conjunction with the MIN function.

Why do we use the MIN function ?

Because the concatenation is after the input data , if any digit occurs in the input data , the SEARCH function will return its position ; this result value will be smaller than the result value for a digit which does not occur in the input data , since those digits will always occur later ; in our above example , the digits 1 and 2 occur in the first and second position ; the earliest any other digit can occur is in position 4 since our input number is 3 digits in length.

Can we use the concatenation before the input data in conjunction with the MAX function ?

If we try the same example , with the number 121 in cell A1 , and the formula being :

=SEARCH({0,1,2,3,4,5,6,7,8,9},"0123456789" & A1)

the result is :

{1,2,3,4,5,6,7,8,9,10}

which is useless !

To summarise , we need to use the concatenation after the input data , and use the MIN function , which will give us the position of the first digit in the input data , what ever that digit may be.

Once we know where the first digit is , the next step is to extract all the digits thereafter , so that we end up with the number.

Let's use the input string as Office365.

Using the following formula :

=MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))

will return 7 , since this is the position where the first digit , 3 in this case , occurs.

We need to use this to return the number 365.

Would it be possible to look at sub-strings of the input string , which are progressively longer , starting with position 7 , so that we would have the strings 3 , 36 and 365 ?

If we could do this , we could look at the sub-string which gives us the maximum value , since the longest string of digits would be the largest number available.

This is what the combination of the LOOKUP function and the MID function do.

The MID function allows us to extract sub-strings from a parent string , specifying the position from where to start , and the length of the sub-string to be extracted.

Thus , using the same example ,

=MID(A1 , 7 , 1) would give us 3

=MID(A1 , 7 , 2) would give us 36

=MID(A1 , 7 , 3) would give us 365

and so on , as long as there were digits available to be added on to the number.

So , could we use something like :

=MID(A1 , 7 , {1,2,3})

to give us an array of sub-strings ?

Trying it out , gives us :

{"3","36","365"}

This looks good , but we would not know in advance how many digits are present in the input string ; could we try something like this ?

=MID(A1 , 7 , {1,2,3,4,5,6,7,8,9,10})

Trying it out , gives us :

{"3","36","365","365","365","365","365","365","365","365"}

So , we are nearly there ; all that remains is to extract the maximum value from the above array. The LOOKUP function is the ideal one for this.

Before we continue with the usage of LOOKUP in this example , let us digress a bit. Suppose we have an input array of random numbers in random order , as follows :

{33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43}

What would happen if we used the LOOKUP function on this array ?

What could we pass as the first parameter to the LOOKUP function ?

1. We could pass a number smaller than the smallest number in the array e.g. 1
2. A number bigger than the biggest number in the array e.g. 100
3. An number which is in the array e.g. 5
4. A number not in the array , but within the bounds of the minimum and maximum of the array e.g. 37
5. A letter of the alphabet or any special character

1. =LOOKUP(1 , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})

gives us #N/A

2. =LOOKUP(100 , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})

gives us 43

3. =LOOKUP(5 , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})

gives us #N/A

4. =LOOKUP(37 , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})

gives us 34

5. =LOOKUP("Z" , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})

gives us #N/A

Results 2 and 4 are intriguing ; how and why do we not get error values in these 2 cases ?

We can try out various other values for the first parameter , such as :

8 , 14 , 2 , 15 , 65 , 6 , 32 , 35 , 40 , 44

I suggest you try these values out , and see if you can come to an understanding of how the results are arrived at ; if you still have doubts , lookup (!) this thread , which has explanations by Misra , Lori and Shrivallabha.

http://chandoo.org/forum/threads/how-vlookup-works.18378/

What is clear at the end of it all is that if we pass a value which is bigger than the biggest value in the data array , the LOOKUP function returns the last value in the data array ; this may or may not be the biggest value in the data array ; if the data array is sorted in ascending order , it will be , otherwise it may or may not be.

All that remain is to understand how we pass an array to the MID function which is large enough to return us all the progressively longer sub-strings.

One way is to assume that the input string will never be longer than a particular value , and then use this value ; this is what the portion :

ROW($A$1:$A$10000)

does.

It assumes that the input string will never be longer than 10000 characters ; a method which does not tax Excel to this extent is to use the length of the input string as the limit itself.

One such construct using this technique would be :

=MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW(OFFSET($A$1,,,LEN(A1))))

where the portion highlighted in blue does the same job in both formulae.

=MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW($A$1:$A$10000))

Just one small point remains !

There is a "0" concatenated to the MID function , which is highlighted in the formula :

=LOOKUP(99^99,--("0"&MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW($1:$10000))))

What is the reason for this ?

Remove it and try !

When we remove the "0" concatenation , and try the formula with an input string which does not contain any digits , we get the #N/A error value.

With the "0" concatenation , trying the formula with a similar input , we get the output as 0.

However , this is also not a correct result , since if the input string does contain a zero , the output is still 0.

If you ask me , we can modify this slightly so that we get an accurate representation of the digits in the input string , rather than the number represented by those digits ; to do this , change the "0" to "1" , and wrap a MID function around the LOOKUP as follows :

=MID(LOOKUP(99^99,--("1"&MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW($1:$10000)))),2,9999)

which will display a blank cell if the input string does not contain any digit , and displays all the digits as a text string if the input string does contain digits.

Lastly , why do we need the double minus -- ?

LOOKUP works equally well on text and numbers ; however , the two parameters to the LOOKUP function need to be compatible ; if the array of data is numeric , the first parameter should be numeric , and vice-versa.

Similarly , if the array of data is non-numeric , then the first parameter should be non-numeric.

To be absolutely strict , if the first parameter is numeric , then the data array should contain at least one numeric value ; similarly , if the first parameter is non-numeric , then the data array should contain at least one non-numeric value. Otherwise , the LOOKUP function will return the #N/A error value.

Any doubts ?

Narayan
 
Back
Top