Formula Forensics No. 34. Extract words from a cell, where they occur in a list of words.
At the Chandoo.org Forums user RMajare posted a request:
http://chandoo.org/forums/topic/find-a-list-of-words-in-a-cell
His request was to list all names that appeared in the sentence to be shown in one cell separated by commas.
While it is possible with a dash of VBA (Google for ACONCAT by Harlan Grove), Excel formulas straight away do not provide this flexibility using the CONCATENATE function.
So Shrivallabha suggested an alternative which was to list the words found across columns.
This post written by Shrivallabha will explain how his solution works.
As always at formula Forensics You can follow along in a sample file here: Download Sample File
Objective
Cells B3:B12 [Namelist Column] holds a list of names which are to be searched in sentences in the Cells C3:C7 [Sentence Column].
The results are then to be listed across starting from cell D3 [Formula Columns].
So we will write a formula in Cell D3 and then copy it down and across.
Note: We could have easily kept the Sentence column to the left of NameList but we may need free space to keep adding columns in Formula section and we don’t know how many columns we’ll need. This arrangement ensures that we don’t have to change layout to adjust.
Formula
Here’s the formula used which needs to be array entered in Cell D3:
=IFERROR(INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2))),”-“) then press Ctrl+Shift+Enter
Where, NameList is a named formula which refers to the Range B3:B12.
Note: The formula above can be used for Excel Versions 2007 and above. For Excel Versions 2003 and previous we’ll need to resort to =IF(ISERROR(Formula),Formula,”-“)
or
=If(Iserror( INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2))) ), INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2))),”-“)
Now let’s understand how above first formula works
Always remember, any complex formula is made up of more than one formula similar to a big program built up using small sub-routines.
To begin with, it may seem daunting but as you delve deeper and break the formula down to basic functions then it is just becomes a matter of understanding the inter-dependency of these basic functions with one another.
We’ll do the same to understand this formula.
By doing away with IFERROR what remains is the core formula which gets us the results we are interested in:
=IFERROR(INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2))),”-“)
We will examine the internal formula starting with Index
INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2)))
We are using INDEX which is one of the frequently used functions in Excel which has following construct:
=INDEX(array, row_num, column_num)
This function returns an element based on the row_num, column_num arguments passed e.g. if we write formula as:
=INDEX(A1:E5, 2, 3)
Then it will return value in Cell C2.
We use this function to return the match found in Named Range “NameList”.
To find out the name that exists in given sentence we use following construct:
IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2)
Here we need to use Excel’s “Evaluate Formula” functionality which shows the steps that Excel takes internally to arrive at the result.
“Evaluate Formula” is accessed by selecting part of the formula and pressing F9 and will show results as below for Cell D3:
- The NameList array will be searched up like IF(ISNUMBER(FIND({“Alka”;”Aniket”;”Anil”;….;”Vrunda”},$C3,1)),ROW(NameList)-2).
Note: SEARCH function shall be used in place of FIND if we do not need search to be case sensitive.
- And then it will be searched in sentence cell C3 which will result in array which has numerical results along with #value! errors like IF(ISNUMBER({1;#value!; #value!;….;10; #value!}),ROW(NameList)-2).
- ISNUMBER is another useful function which handles errors and numerical results (#value!) and creates BOOLEAN results as required by FIND function as IF({TRUE;FALSE;FALSE;….;TRUE; FALSE}),ROW(NameList)-2).
- The ROW(NameList) function returns results as {3;4;5;6;7;…;11;12} so when we are processing these results which come from if function. So the numerical results will be correlated with following TRUE results: {3, 11}.
- If we look at our NameList array then they will refer to 3rd and 11th row respectively. But our data has only 10 elements and 3rd element is “Anil” which is incorrect and then 11 will cause #REF error. What you’ll notice is the result is always offset by 2 rows. That is because our data range starts at 3rd row. So we adjust it by subtracting 2 from it. So it refers to correct results: {1, 9}.
I hope by now we have understood how we use above IF construct to get the results. At this point, you might be still wondering why we used SMALL function around IF.
SMALL(If_function, n)
- We have multiple results and we need to show one result at a time which is done by SMALL(If_formula, COLUMNS($D$3:D3)).
COLUMNS ($D$3:D3) results in 1 in cell D3 i.e. count of columns.
As we copy across $D$2 remains constant and D2 changes.
So in E2 the value becomes 2, in F2 it becomes 3 and so forth.
- So in cell D3 we get first result of SMALL formula i.e. 1 and therefore it returns “Alka” and in E3 “Vinay” and then there is no third match in F3 so INDEX formula errors out which is then handled by IFERROR function.
Refer the attached workbook and play with it.
I hope this formula helps you somewhere someday.
Shrivallabha
DOWNLOAD
You can download a copy of the above file and follow along, Download Sample File.
OTHER POSTS IN THIS SERIES
The Formula Foerensics Series contains a wealth of useful solutions and information.
You can learn more about how to pull Excel Formulas apart in the following posts: http://chandoo.org/wp/formula-forensics-homepage/
THANK-YOU and a CHALLENGE
Firstly, Congratulations to Shrivallabha on taking up the challenge and on your First Post at Chandoo.org.
Thank-you for explaining to us all how this formula, which has appeared a number of times on the Chandoo.org Forums, works.
Your Challenge?
If you have a clever formula and would like to become an author here at Chandoo.org please consider writing it up as Shrivallabha has done above.
You can submit it to Chandoo or Hui.
Hello Awesome...
My name is Chandoo. Thanks for dropping by. My mission is to make you awesome in Excel & your work. I live in Wellington, New Zealand. When I am not F9ing my formulas, I cycle, cook or play lego with my kids. Know more about me.
I hope you enjoyed this article. Visit Excel for Beginner or Advanced Excel pages to learn more or join my online video class to master Excel.
Thank you and see you around.
Related articles:
|
Leave a Reply
« Calculating average of every nth value [Formula tips] | Using Arrays To Update Table Columns » |
16 Responses to “Formula Forensics No. 34. Extract words from a cell, where they occur in a list of words.”
Different approach and same results, but different order. However, it doesn’t get mess if you insert rows before NameList range.
=IFERROR(LOOKUP(0,-FIND(NameList,$C3)/(COUNTIF($C$3:C3,NameList)=0),NameList),"-")
Regards
thanks, i used it today (with modification) and works great (haven't really used array formulas much in the past).
a slight modification, depending on what you're doing, to get around the case where rows above NameList may be added or deleted
=IFERROR(INDEX(B:B,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)),COLUMNS($D$2:D2))),”-”)
("INDEX(NameList" >> "INDEX(B:B" .... "ROW(NameList)-2" >> drop the "-2")
If order is important here is another option,
=IFERROR(INDEX(NameList,MATCH(1,ISNUMBER(FIND(NameList,$C3))/(COUNTIF($C$3:C3,NameList)=0),0)),"-")
Regards
Good morning chando Sir, i am Hamza asghar s/o asghar ali i am 11 yrs old & 6th class student , i have my desktop computer but sometime i use my papa's laptop & and his e.mail ID too. i like to understand excel , as like as you hav command on its. can will you teach me ?? i like to know all formulas but easy . i am waiting for ur reply .
Best Regards
alternate method could be
DATA - TEXT TO COLUMNS -DELIMITED - COMMAS. (put symbol ( , )- FINISH
Priyanka,
Check out the workbook and requirement. Those words are interspersed in the sentence.
e.g.
Ram likes to visit Durgapur
Namelist has Ram, Shyam, Seeta.
Nice formula
Isn't it time for Excel to become simpler.
The same basic problems has existed for years, giving rise to ever more elaborate work around's.
Even though they can be fun to make/decipher. Isn't it time to find the missing main building blocks (formulas) that could make Excel simpler in the long run.
For example =concatenate() that still can not concatenate an array. Isn't Harlan Grove's Aconcat work around (which must be more than 10 years old) a prime example of a problem that has existed for years and still has no simple solution.
Or am I just getting old.
An ugly modification to the formula which will give you the names in the order they appear in the sentence (in cell c3).
=IFERROR(INDEX(NameList,MATCH(SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),FIND(NameList,$C3,1),FALSE),COLUMNS($D$2:D2)),IF(ISNUMBER(FIND(NameList,$C3,1)),FIND(NameList,$C3,1),FALSE),0)),"-")
Here is another one, which will extract in order as they appear in NameList. Also will ignore DUPLICATE, if there is.
=IFERROR(INDEX(NameList,MATCH(0,0*FIND(IF(ISNA(MATCH(NameList,$C3:C3,0)),NameList),$C3),0)),"-")
To compatibility with all versions,
=LOOKUP("zzz",IF({1,0},"-",INDEX(NameList,MATCH(0,0*FIND(IF(ISNA(MATCH(NameList,$C3:C3,0)),NameList),$C3),0))))
Hi,
I am having a slight problem with the formula shown.
The first time I use, it returns some incorrect values, but if I Press F2 and F9, the value returned is correct.
Can you help me with that?
Best Regards
@Renan
Did you press Ctrl+Shift+Enter instead of Enter after originally entering the formula ?
Pressing F2, then F9 is effectively doing that
Hi all, Im trying to follow all your amazing solution(really are great congrats), but Im not able to apply them on my case. I'll try to explain briefly. Have 3 columns Like these below
1a. City |1b.Data ||| 2. Universities
Rome | 20% ||| University of Padua blablabla
Pisa | 5% ||| University of Rome blablabla
Bologna | 17% ||| University of Palermo Blablabla
||| University of Pisa blablabla
||| University of Something
What im trying to do is something like:
If the first cell of Tab Universities contains one of the cities from table 1.a City, then write in another cell the corresponding value from 1b.Data. that corresponds to the found city.
Do u may have some suggestions?
Thank u all for your time
G
Just loved the sample file tutorial! With its step-by-step instructions & illustrations I was able to modify the formula to meet my particular needs! Thanks for the assistance you didn't know you were giving.
https://eileenslounge.com/viewtopic.php?f=27&t=30006
Interesting, this has been my case for many years. Macro code on above link can be an option.
Hi Chandoo, many thanks for your great tutorial. It helped me a lot and saved me a lot of time and effort. Great job!