Hi Linda ,
Now that we understand how the SEARCH function works , the question is why do we concatenate the string of digits to the value in the input cell i.e. why do we use the construct :
A1&"0123456789"
Since the SEARCH function returns an error value if it does not find the search value ( which are the digits in our case ) , we will have to take care of the error values at the beginning stage , since if we allow them to remain as error values , they will affect the outputs of the other functions that we are going to use , such as the MIN and MID functions.
For example , suppose A1 contains the number 121 , and we use the formula :
=SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")
The output that we will see will be :
{4,1,2,7,8,9,10,11,12,13}
The output if our formula were :
=SEARCH({0,1,2,3,4,5,6,7,8,9},A1)
would be :
{#VALUE!,1,2,#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!}
The error values are because the digits 0 , 3 , 4 , 5 , 6 , 7 , 8 and 9 do not figure in the number.
We cannot now use the MIN function on this output directly , since the MIN function does not ignore error values.
Thus , the concatenation is a technique that should be used when ever you use the SEARCH function in conjunction with the MIN function.
Why do we use the MIN function ?
Because the concatenation is after the input data , if any digit occurs in the input data , the SEARCH function will return its position ; this result value will be smaller than the result value for a digit which does not occur in the input data , since those digits will always occur later ; in our above example , the digits 1 and 2 occur in the first and second position ; the earliest any other digit can occur is in position 4 since our input number is 3 digits in length.
Can we use the concatenation before the input data in conjunction with the MAX function ?
If we try the same example , with the number 121 in cell A1 , and the formula being :
=SEARCH({0,1,2,3,4,5,6,7,8,9},"0123456789" & A1)
the result is :
{1,2,3,4,5,6,7,8,9,10}
which is useless !
To summarise , we need to use the concatenation after the input data , and use the MIN function ,
which will give us the position of the first digit in the input data , what ever that digit may be.
Once we know where the first digit is , the next step is to extract all the digits thereafter , so that we end up with the number.
Let's use the input string as
Office365.
Using the following formula :
=MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))
will return 7 , since this is the position where the first digit , 3 in this case , occurs.
We need to use this to return the number 365.
Would it be possible to look at sub-strings of the input string , which are progressively longer , starting with position 7 , so that we would have the strings 3 , 36 and 365 ?
If we could do this , we could look at the sub-string which gives us the maximum value , since the longest string of digits would be the largest number available.
This is what the combination of the LOOKUP function and the MID function do.
The MID function allows us to extract sub-strings from a parent string , specifying the position from where to start , and the length of the sub-string to be extracted.
Thus , using the same example ,
=MID(A1 , 7 , 1) would give us 3
=MID(A1 , 7 , 2) would give us 36
=MID(A1 , 7 , 3) would give us 365
and so on , as long as there were digits available to be added on to the number.
So , could we use something like :
=MID(A1 , 7 , {1,2,3})
to give us an array of sub-strings ?
Trying it out , gives us :
{"3","36","365"}
This looks good , but we would not know in advance how many digits are present in the input string ; could we try something like this ?
=MID(A1 , 7 , {1,2,3,4,5,6,7,8,9,10})
Trying it out , gives us :
{"3","36","365","365","365","365","365","365","365","365"}
So , we are nearly there ; all that remains is to extract the maximum value from the above array. The LOOKUP function is the ideal one for this.
Before we continue with the usage of LOOKUP in this example , let us digress a bit. Suppose we have an input array of random numbers in random order , as follows :
{33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43}
What would happen if we used the LOOKUP function on this array ?
What could we pass as the first parameter to the LOOKUP function ?
1. We could pass a number smaller than the smallest number in the array e.g. 1
2. A number bigger than the biggest number in the array e.g. 100
3. An number which is in the array e.g. 5
4. A number not in the array , but within the bounds of the minimum and maximum of the array e.g. 37
5. A letter of the alphabet or any special character
1. =LOOKUP(
1 , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})
gives us #N/A
2. =LOOKUP(
100 , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})
gives us 43
3. =LOOKUP(
5 , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})
gives us #N/A
4. =LOOKUP(
37 , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})
gives us 34
5. =LOOKUP(
"Z" , {33 ; 7 ; 13 ; 3 ; 4 ; 16 ; 66 ; 2 ; 5 ; 18 ; 34 ; 43})
gives us #N/A
Results 2 and 4 are intriguing ; how and why do we not get error values in these 2 cases ?
We can try out various other values for the first parameter , such as :
8 , 14 , 2 , 15 , 65 , 6 , 32 , 35 , 40 , 44
I suggest you try these values out , and see if you can come to an understanding of how the results are arrived at ; if you still have doubts , lookup (!) this thread , which has explanations by Misra , Lori and Shrivallabha.
http://chandoo.org/forum/threads/how-vlookup-works.18378/
What is clear at the end of it all is that if we pass a value which is bigger than the biggest value in the data array , the LOOKUP function returns the
last value in the data array ; this may or may not be the biggest value in the data array ; if the data array is sorted in ascending order , it will be , otherwise it may or may not be.
All that remain is to understand how we pass an array to the MID function which is large enough to return us all the progressively longer sub-strings.
One way is to assume that the input string will never be longer than a particular value , and then use this value ; this is what the portion :
ROW($A$1:$A$10000)
does.
It assumes that the input string will never be longer than 10000 characters ; a method which does not tax Excel to this extent is to use the length of the input string as the limit itself.
One such construct using this technique would be :
=MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),
ROW(OFFSET($A$1,,,LEN(A1))))
where the portion highlighted in
blue does the same job in both formulae.
=MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),
ROW($A$1:$A$10000))
Just one small point remains !
There is a "0" concatenated to the MID function , which is highlighted in the formula :
=LOOKUP(99^99,--(
"0"&MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW($1:$10000))))
What is the reason for this ?
Remove it and try !
When we remove the "0" concatenation , and try the formula with an input string which
does not contain any digits , we get the #N/A error value.
With the "0" concatenation , trying the formula with a similar input , we get the output as 0.
However , this is also not a correct result , since if the input string does contain a zero , the output is still 0.
If you ask me , we can modify this slightly so that we get an accurate representation of the digits in the input string , rather than the number represented by those digits ; to do this , change the "0" to "1" , and wrap a MID function around the LOOKUP as follows :
=MID(LOOKUP(99^99,--("1"&MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW($1:$10000)))),2,9999)
which will display a blank cell if the input string does not contain any digit , and displays all the digits as a text string if the input string does contain digits.
Lastly , why do we need the double minus -- ?
LOOKUP works equally well on text and numbers ; however , the two parameters to the LOOKUP function need to be compatible ; if the array of data is numeric , the first parameter should be numeric , and vice-versa.
Similarly , if the array of data is non-numeric , then the first parameter should be non-numeric.
To be absolutely strict , if the first parameter is numeric , then the data array should contain at least one numeric value ; similarly , if the first parameter is non-numeric , then the data array should contain at least one non-numeric value. Otherwise , the LOOKUP function will return the #N/A error value.
Any doubts ?
Narayan