• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Finding closest text

jits50

New Member
Can this be done for a text field which is a combination of text and number.? For example I want to search maint5 within maint1 maint7 maint10 vaint2 vaint4.I should get the value maint7 which is closest to maint5. In excel we can assume maint5 is incell a1 and the other value are in cells b1 to b5. The alphanumeric value should match exactly (in this case "maint"). Only the numeric value should have the closest match. Which means mainz5 should not return any value but maint5 returns maint7.
 
Hi, jits50!

Asumming you have in your list in column B the values 'maint49', 'maint50', 'maint51" and 'maint7", which will be the closest?

Regards!

EDIT: the list in column B is supposed to be ordered or not?
 
maint7 will be closest. Column B is basically a concatenated field. It is ordered first on the alphanumeric part for example maint, mainz and vaint. Then it is further ordered on the numeric prt. So maint7 will be above maint49 which will b above maint50 which will be above mainz1 , which will be above vaint1.
 
Hi


There may be better ways


Code:
=INDEX(B1:B5,MATCH(MIN(ABS(REPLACE(A1,1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},$A$1&"0123456789"))-1,"")+ 0-IFERROR(SUBSTITUTE($B$1:$B$5,LEFT($A$1, MIN(FIND({0,1,2,3,4,5,6,7,8,9},$A$1&"0123456789"))-1),"")+0,9.999E+307))), ABS(IFERROR(SUBSTITUTE($B$1:$B$5, LEFT($A$1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},$A$1&"0123456789"))-1),"")+0,9.999E+307)- REPLACE(A1,1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},$A$1&"0123456789"))-1,"")+0),0))


Array formula.


Kris
 
Hello @jits50,


I am assuming that this is more of an academic exercise since you would probably want to use the string and number before concatenation for efficiently finding the next match.


I have listed both options below.


Method 1: Using the strings (e.g. "maint") and numbers (e.g. 5) that are not concatenated, you could use something like the following:



=D12 & INDEX(numlist,MATCH(TRUE,IF(strlist=D12,numlist, 0)>=E12, 0))



confirm with Ctrl + Shift + Enter


In the formula, D12 refers to the string being sought (e.g. "maint")

E12 refers to the number being sought (e.g. 5)

strlist is the named range for the strings (e.g. maint, maint, vaint, etc.)

numlist is the named range for the numbers (e.g. 1,4,7,2, etc.) corresponding to the strings


Method 2: Working with the concatenated strings (such as "maint1", "vaint7", etc.)

You could use a formula such as the following:



=INDEX(List,MATCH(TRUE,VALUE(SUBSTITUTE(List,MID(C1,1,MATCH(TRUE, ISNUMBER(MID(C1,{1,2,3,4,5,6,7,8},1)+0), 0)-1),""))>=LOOKUP(999,RIGHT(C1,{1,2,3,4,5,6,7,8})+0),0))



Confirm with Ctrl + Shift + Enter


In the formula, List refers to the range of alphanumeric strings (e.g. maint1, vaint7, etc.)

C1 refers to the alphanumeric string being sought (e.g. maint5)

I have assumed that the largest number you would have is less than 999. Otherwise, adjust that to a suitably high number.

For simplicity, I have assumed that the max length of the strings is 8. You could make that more generic using something like COLUMN($A$1:INDEX($1:$1, MAX(LEN(List)))) instead of {1,2,3,4,5,6,7,8}


Both formulas will return an error if a suitable match is not found. Feel free to wrap the formulas in IFERROR(...)


Cheers,

Sajan.
 
@Sajan

Hi!

Yesterday I thought "Krisnakumar provided a solution, I'm done, don't have nothing to do but check if wanted to... and I did.". Now I promise not to do the same with your super simple and easy formula!

Regards!
 
Hi @SirJB7,

I think it is always good to get multiple choices for solutions. I am sure someone will offer a simpler solution yet. That is the great thing about this forum!


Regards,

Sajan.
 
Hi Sajan ,


Thanks for a great formula ; I have been able to taste it and enjoy it for quite some time !


I suggest the following simplification , based on what is called “reusable components”.


If we start to do some “reverse engineering” , what we need to do is the following :


Let us start with the following named ranges :


1. The list of text strings ( e.g. maint1 , maint7 , maint10 , maint49 , vaint4 , vaint9 ) is strlist according to your post.


2. The text string which is to be matched , say maint37 , is named match_text.


3. The following formula , named Formula_1 , will give us the numeric part of match_text , which in this case is 37 :

=LOOKUP(99,RIGHT(match_text,ROW(INDIRECT("1:"&LEN(match_text))))+0)


I have merely replaced your array construct {1,2,3,4,5,6,7,8} by the flexible ROW(INDIRECT(…)) construct.


4. Now that we have the numeric portion , we isolate the text portion by a simple FIND statement as follows :

=LEFT(match_text,FIND(Formula_1,match_text)-1)


This will return the text string maint ; suppose we name this as Formula_2.


5. Now for your magic ! Replace the string found in step 4 above , by the string "" , by using the SUBSTITUTE function :

=SUBSTITUTE(strlist,Formula_2,””)


What this does is create an array of string values {"1";"7";"10";"49";"vaint4";"vaint39"}


6. Now , we create an array of numeric values , by taking VALUE of the above , which returns the following array :

{1;7;10;49;#VALUE!;#VALUE!}


Let us name this formula as Formula_3


7. We compare this array with the value returned by Formula_1 ( 37 in our case ) thus :

=Formula_3>=Formula_1


This will return the following array :

{FALSE;FALSE;FALSE;TRUE;#VALUE!;#VALUE!}


8. Using the MATCH(TRUE,…,0) construct as follows :


=MATCH(TRUE,Formula_3>=Formula_1,0)


Will return 4 , since the 4th value above is TRUE.


9. Indexing this value into the list of input strings strlist , will give us maint49 , which is the desired output.


The final formula is :

Formula_1 : =LOOKUP(99,RIGHT(match_text,ROW(INDIRECT("1:"&LEN(match_text))))+0)

Formula_2 : =LEFT(match_text,FIND(Formula_1,match_text)-1)

Formula_3 : =VALUE(SUBSTITUTE(strlist,Formula_2,""))


All put together : =INDEX(strlist,MATCH(TRUE,Formula_3>=Formula_1,0))


If we remove all the named range references , and replace the named range match_text by the cell reference which you have used viz. C1 , we have :


=INDEX(strlist,MATCH(TRUE,VALUE(SUBSTITUTE(strlist,LEFT(C1,FIND(LOOKUP(99,RIGHT(C1,ROW(INDIRECT("1:"&LEN(C1))))+0),C1)-1),""))>=LOOKUP(99,RIGHT(C1,ROW(INDIRECT("1:"&LEN(C1))))+0),0))


Narayan
 
hats off to Sajan for the formula and Narayan for explanation. I saw this post and played sometime with a formula based answer, each time drawing a blank. So I bookmarked this thread and waited for others to enlighten me. :)
 
Since the OP is looking for closest match and if have a list like


maint1 , maint4, maint7 , maint10 , maint49 , vaint4 , vaint9


and the expecting answer is maint4, only my formula would work.


Kris
 
Hi Narayan, Chandoo,

Thanks for the kind words. I am glad you liked the technique.

And thanks for writing up the explanation, and reusing formula segments! I should remember to reuse more!! :)


Hi Kris,

I would encourage you to read Narayan's excellent explanation.


Regards,

Sajan.
 
@ Sajan, I do agree that Narayan did a great to explain the formula. What I'm saying that it doesn't give me maint4 from the following list


maint1 , maint4, maint7 , maint10 , maint49 , vaint4 , vaint9


and the search key being maint5


I think the closest match would be maint4 from this list unless I'm missing something.


I'm not saying that my formula is great, however just highlighted the point that the OP is looking for the closest match.


Kris
 
Thank you all particularly KrishnaKumar and Sajan for excellent solutions.

I was in a hurry for the solution and came up with this.


=IFERROR(INDEX($M$2:$M$2218,MATCH(SMALL(IF($L$2:$L$2218 = B2,ABS($M$2:$M$2218-C2)),1),IF($L$2:$L$2218 = B2,ABS($M$2:$M$2218-C2)),0))

INDEX($M$2:$M$2218,MATCH(SMALL(IF($L$2:$L$2218 = B2,ABS($M$2:$M$2218-C2)),1),IF($L$2:$L$2218 = B2,ABS($M$2:$M$2218-C2)),0)),"")


Column B has maint , column C has 5. Column L has multiple maint, multiple mainz, multiple vaint etc and Column M has the number 1,7, 10 for maint and so on.


Thanks again all for your help.
 
Hi @jits50,

Thanks for the feedback. Glad to hear that you were able to solve the problem yourself!


Regards,

Sajan.
 
Hi Krishnakumar ,


First , let me apologize for unwittingly having generated a controversy. I took up Sajan’s formula because it appeared easier to deconstruct , and it fitted in with my theme of reusable components. I thought that once I had constructed the components for Sajan’s formula , it would be easier to deconstruct your formula , using those same components and a few others. Please let me know whether I have done well.


Just to recapitulate , there were three formulae which were used to arrive at the final formula created by Sajan ; these were :


Formula_1 : =LOOKUP(99,RIGHT(match_text,ROW(INDIRECT("1:"&LEN(match_text))))+0)

Formula_2 : =LEFT(match_text,FIND(Formula_1,match_text)-1)

Formula_3 : =VALUE(SUBSTITUTE(strlist,Formula_2,""))


Final Formula : =INDEX(strlist,MATCH(TRUE,Formula_3>=Formula_1,0))


Let us understand what each of the first 3 formulae do :


1. This extracts the numeric portion of the text string , since the text strings are in a standard format , with the alpha portion of the string appearing first , and the numeric portion appearing on the right e.g. maint1 , mainz49 , vaint10 etc.


The length of each of the portions is variable , which is why we cannot have a standard LEFT , MID or RIGHT function to isolate the numeric or the alpha portions.


The formula will isolate the 1 from maint1 , the 49 from mainz49 , or the 10 from vaint10.


2. The second formula extracts the alpha portion of the text string ; from the above examples , it will isolate the maint from maint1 , the mainz from main49 , or the vaint from vaint10.


3. The third formula retrieves an array of numeric values , matching the text string found using the second formula i.e. where the alpha portion of the text string matches the alpha portion found in (2) above , it will return the numeric portion ; everywhere else , it will return #N/A.


Thus , given an initial list as follows :


{“maint1”;”maint4”;”maint7”,”mainz2”;”mainz49”;”vaint4”;”vaint39”}


and a match text of maint2 , the above formula will return the following array :


{1;4;7;#N/A;#N/A;#N/A;#N/A}


The final formula merely found the first value in the above array greater than the numeric portion of match text ; for a match text of maint2 , this is 2. The first value in the above array greater than 2 is 4 , and the result would be a MATCH output of 2 , since 4 is the second item in the array. Using this to index into the list of text strings would return maint4.


The beauty of the above formulae is that they can extended to be used to form arrays , instead of single values !


Let us now go into how we can get the answer to the problem viz. what is the closest match in the lists of text strings , in strlist , to the given text string , in match_text.


The final formula will be somewhat like this :


=INDEX(strlist, get the closest match of the numeric portion of match_text , to the numeric portions in strlist , where the alpha portion of the match text matches the alpha portions in strlist)


In order to achieve the above , we need to do the following :


1. Isolate the numeric portion of match_text

2. Isolate the alpha portion of match_text

3. Isolate the numeric portions of strlist

4. Isolate the alpha portions of strlist

5. Create an array of the numeric portions found in (3) above , only where the alpha portions match the alpha portion of match_text

6. From this array , get the closest match of the numeric portion of match_text found in (1) above.

7. Use this match value to index into strlist.

We already know how to get (1) and (2) ; Formula_1 gives us 1 ; Formula_2 gives us 2.

Isolating the numeric portions of strlist is done by a new formula , Formula_4 :


=1*MID(strlist,MATCH(TRUE,ISNUMBER(1*MID(strlist,ROW(Sheet1!$1:$9),1)),0),COUNT(1*MID(strlist,ROW(Sheet1!$1:$9),1)))


Explaining this is a topic in itself , and has been done here :


http://office.microsoft.com/en-us/excel-help/extracting-numbers-from-alphanumeric-strings-HA001154901.aspx


Once we have this , isolating the alpha portions of strlist is done by using Formula_2 , and passing strlist to it as a parameter ; thus :


=LEFT(strlist,FIND(Formula_4,strlist)-1)


will return an array of the alpha portions of strlist ; this is the power of reusable components ! Let us name this Formula_5.


Taking care of point 5 is very simple :


=IF(Formula_5=Formula_2,1,999)*Formula_4


Will return an array of values , containing the numeric portions of strlist only where the alpha portion of strlist matches the alpha portion of match_text :


{1;4;7;1998;48951;3996;38961}


The values 1998 , 48951 , 3996 and 38961 are because the numeric portions of strlist have been multiplied by 999 where the alpha portions of strlist do not match the alpha portion of match_text. Why do we need 999 ? This will become clear once we come to the final step of getting the closest match.


Now for the final step ; how to get the closest match of the numeric portion of match_text ?


Suppose we have a list of numbers , which we will call numlist :


{1;4;7;2;49;4;39}


Suppose we have a number match_num , for which we would like to find the closest match within numlist. The formula to do this is :


=INDEX(numlist,MATCH(MIN(ABS(numlist-match_num)),ABS(numlist-match_num),0))


To see what we are doing here , let us consider the outputs if match_num is 44 :


1. Find out the difference between match_num and each number in numlist ; we take the absolute value of this difference since it does not matter whether the difference is positive or negative ; 3 and 5 are equally close to 4.

This will result in the following array : {43;40;37;42;5;40;5}

2. Take the lowest of the above differences , which is 5.

3. Match this value with the values in the array ; we will get the first array location where 5 occurs i.e. 5 ( this is incidental ! )

4. Index into numlist to get the 5th value from numlist ; we get 49.


Suppose we reduce match_num to 43 ; what happens now ?


In step 1 , the array of differences will be : {42;39;36;41;6;39;4}

In step 2 , the lowest of the differences is now 4.

In step 3 , matching 4 with the array values results in a location of 7

In step 4 , indexing into numlist with a value of 7 gives us 39.


How do we reuse this component in our application ?


What we need to realise is that numlist is an array ; we can pass any array to the above formula , and it will work correctly.


We had earlier taken care of point 5 through this formula :


=IF(Formula_5=Formula_2,1,999)*Formula_4


Just substitute numlist with this array , and we are done !


=INDEX(strlist,MATCH(MIN(ABS(IF(Formula_5=Formula_2,1,999)*Formula_4-Formula_1)),ABS(IF(Formula_5=Formula_2,1,999)*Formula_4-Formula_1),0))


Now the reason for 999 in the above IF function is clear ; such a high value ensures that none of these values will be returned as a close match e.g. suppose our match_text is maint48 ; now if we only looked at the numeric portions of strlist ({1;4;7;2;49;4;39} ) , then 48 is closest to 49 ; but the alpha portions are as follows :

{“maint”;”maint”;”maint”,”mainz”;”mainz”;”vaint;”vaint”}

and 49 is a part of mainz49 , and since mainz does not match maint , our array no longer returns 49 in that position ; the 999 will ensure that the number in that location is 999 x 49 = 48951 , which will ensure that the closest number to 48 returns the correct value of 7 , since the alpha portion of the text string maint7 matches the alpha portion of match_text maint48.


Please comment if you find any mistakes. I have done some amount of testing but I leave it to others to test it more thoroughly if you can.


Narayan
 
Back
Top