Hi Krishnakumar ,
First , let me apologize for unwittingly having generated a controversy. I took up Sajan’s formula because it appeared easier to deconstruct , and it fitted in with my theme of reusable components. I thought that once I had constructed the components for Sajan’s formula , it would be easier to deconstruct your formula , using those same components and a few others. Please let me know whether I have done well.
Just to recapitulate , there were three formulae which were used to arrive at the final formula created by Sajan ; these were :
Formula_1 : =LOOKUP(99,RIGHT(match_text,ROW(INDIRECT("1:"&LEN(match_text))))+0)
Formula_2 : =LEFT(match_text,FIND(Formula_1,match_text)-1)
Formula_3 : =VALUE(SUBSTITUTE(strlist,Formula_2,""))
Final Formula : =INDEX(strlist,MATCH(TRUE,Formula_3>=Formula_1,0))
Let us understand what each of the first 3 formulae do :
1. This extracts the numeric portion of the text string , since the text strings are in a standard format , with the alpha portion of the string appearing first , and the numeric portion appearing on the right e.g. maint1 , mainz49 , vaint10 etc.
The length of each of the portions is variable , which is why we cannot have a standard LEFT , MID or RIGHT function to isolate the numeric or the alpha portions.
The formula will isolate the 1 from maint1 , the 49 from mainz49 , or the 10 from vaint10.
2. The second formula extracts the alpha portion of the text string ; from the above examples , it will isolate the maint from maint1 , the mainz from main49 , or the vaint from vaint10.
3. The third formula retrieves an array of numeric values , matching the text string found using the second formula i.e. where the alpha portion of the text string matches the alpha portion found in (2) above , it will return the numeric portion ; everywhere else , it will return #N/A.
Thus , given an initial list as follows :
{“maint1”;”maint4”;”maint7”,”mainz2”;”mainz49”;”vaint4”;”vaint39”}
and a match text of maint2 , the above formula will return the following array :
{1;4;7;#N/A;#N/A;#N/A;#N/A}
The final formula merely found the first value in the above array greater than the numeric portion of match text ; for a match text of maint2 , this is 2. The first value in the above array greater than 2 is 4 , and the result would be a MATCH output of 2 , since 4 is the second item in the array. Using this to index into the list of text strings would return maint4.
The beauty of the above formulae is that they can extended to be used to form arrays , instead of single values !
Let us now go into how we can get the answer to the problem viz. what is the closest match in the lists of text strings , in strlist , to the given text string , in match_text.
The final formula will be somewhat like this :
=INDEX(strlist, get the closest match of the numeric portion of match_text , to the numeric portions in strlist , where the alpha portion of the match text matches the alpha portions in strlist)
In order to achieve the above , we need to do the following :
1. Isolate the numeric portion of match_text
2. Isolate the alpha portion of match_text
3. Isolate the numeric portions of strlist
4. Isolate the alpha portions of strlist
5. Create an array of the numeric portions found in (3) above , only where the alpha portions match the alpha portion of match_text
6. From this array , get the closest match of the numeric portion of match_text found in (1) above.
7. Use this match value to index into strlist.
We already know how to get (1) and (2) ; Formula_1 gives us 1 ; Formula_2 gives us 2.
Isolating the numeric portions of strlist is done by a new formula , Formula_4 :
=1*MID(strlist,MATCH(TRUE,ISNUMBER(1*MID(strlist,ROW(Sheet1!$1:$9),1)),0),COUNT(1*MID(strlist,ROW(Sheet1!$1:$9),1)))
Explaining this is a topic in itself , and has been done here :
http://office.microsoft.com/en-us/excel-help/extracting-numbers-from-alphanumeric-strings-HA001154901.aspx
Once we have this , isolating the alpha portions of strlist is done by using Formula_2 , and passing strlist to it as a parameter ; thus :
=LEFT(strlist,FIND(Formula_4,strlist)-1)
will return an array of the alpha portions of strlist ; this is the power of reusable components ! Let us name this Formula_5.
Taking care of point 5 is very simple :
=IF(Formula_5=Formula_2,1,999)*Formula_4
Will return an array of values , containing the numeric portions of strlist only where the alpha portion of strlist matches the alpha portion of match_text :
{1;4;7;1998;48951;3996;38961}
The values 1998 , 48951 , 3996 and 38961 are because the numeric portions of strlist have been multiplied by 999 where the alpha portions of strlist do not match the alpha portion of match_text. Why do we need 999 ? This will become clear once we come to the final step of getting the closest match.
Now for the final step ; how to get the closest match of the numeric portion of match_text ?
Suppose we have a list of numbers , which we will call numlist :
{1;4;7;2;49;4;39}
Suppose we have a number match_num , for which we would like to find the closest match within numlist. The formula to do this is :
=INDEX(numlist,MATCH(MIN(ABS(numlist-match_num)),ABS(numlist-match_num),0))
To see what we are doing here , let us consider the outputs if match_num is 44 :
1. Find out the difference between match_num and each number in numlist ; we take the absolute value of this difference since it does not matter whether the difference is positive or negative ; 3 and 5 are equally close to 4.
This will result in the following array : {43;40;37;42;5;40;5}
2. Take the lowest of the above differences , which is 5.
3. Match this value with the values in the array ; we will get the first array location where 5 occurs i.e. 5 ( this is incidental ! )
4. Index into numlist to get the 5th value from numlist ; we get 49.
Suppose we reduce match_num to 43 ; what happens now ?
In step 1 , the array of differences will be : {42;39;36;41;6;39;4}
In step 2 , the lowest of the differences is now 4.
In step 3 , matching 4 with the array values results in a location of 7
In step 4 , indexing into numlist with a value of 7 gives us 39.
How do we reuse this component in our application ?
What we need to realise is that numlist is an array ; we can pass any array to the above formula , and it will work correctly.
We had earlier taken care of point 5 through this formula :
=IF(Formula_5=Formula_2,1,999)*Formula_4
Just substitute numlist with this array , and we are done !
=INDEX(strlist,MATCH(MIN(ABS(IF(Formula_5=Formula_2,1,999)*Formula_4-Formula_1)),ABS(IF(Formula_5=Formula_2,1,999)*Formula_4-Formula_1),0))
Now the reason for 999 in the above IF function is clear ; such a high value ensures that none of these values will be returned as a close match e.g. suppose our match_text is maint48 ; now if we only looked at the numeric portions of strlist ({1;4;7;2;49;4;39} ) , then 48 is closest to 49 ; but the alpha portions are as follows :
{“maint”;”maint”;”maint”,”mainz”;”mainz”;”vaint;”vaint”}
and 49 is a part of mainz49 , and since mainz does not match maint , our array no longer returns 49 in that position ; the 999 will ensure that the number in that location is 999 x 49 = 48951 , which will ensure that the closest number to 48 returns the correct value of 7 , since the alpha portion of the text string maint7 matches the alpha portion of match_text maint48.
Please comment if you find any mistakes. I have done some amount of testing but I leave it to others to test it more thoroughly if you can.
Narayan