Formula Forensics No. 34. Extract words from a cell, where they occur in a list of words.

Posted on September 12th, 2013 in Formula Forensics , Huis , Posts by Hui - 14 comments

At the Chandoo.org Forums user RMajare posted a request:

http://chandoo.org/forums/topic/find-a-list-of-words-in-a-cell

His request was to list all names that appeared in the sentence to be shown in one cell separated by commas.

While it is possible with a dash of VBA (Google for ACONCAT by Harlan Grove), Excel formulas straight away do not provide this flexibility using the CONCATENATE function.

So Shrivallabha suggested an alternative which was to list the words found across columns.

This post written by Shrivallabha will explain how his solution works.

As always at formula Forensics You can follow along in a sample file here: Download Sample File

 

Objective

Cells B3:B12 [Namelist Column] holds a list of names which are to be searched in sentences in the Cells C3:C7 [Sentence Column].

The results are then to be listed across starting from cell D3 [Formula Columns].

So we will write a formula in Cell D3 and then copy it down and across.

Note: We could have easily kept the Sentence column to the left of NameList but we may need free space to keep adding columns in Formula section and we don’t know how many columns we’ll need. This arrangement ensures that we don’t have to change layout to adjust.

 

Formula

Here’s the formula used which needs to be array entered in Cell D3:

=IFERROR(INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2))),”-“) then press Ctrl+Shift+Enter

Where, NameList is a named formula which refers to the Range B3:B12.

Note: The formula above can be used for Excel Versions 2007 and above. For Excel Versions 2003 and previous we’ll need to resort to =IF(ISERROR(Formula),Formula,”-“)

or

=If(Iserror( INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2))) ), INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2))),”-“)

 

Now let’s understand how above first formula works

Always remember, any complex formula is made up of more than one formula similar to a big program built up using small sub-routines.

To begin with, it may seem daunting but as you delve deeper and break the formula down to basic functions then it is just becomes a matter of understanding the inter-dependency of these basic functions with one another.

We’ll do the same to understand this formula.

By doing away with IFERROR what remains is the core formula which gets us the results we are interested in:

=IFERROR(INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2))),”-“)

We will examine the internal formula starting with Index

INDEX(NameList,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2),COLUMNS($D$2:D2)))

We are using INDEX which is one of the frequently used functions in Excel which has following construct:

=INDEX(array, row_num, column_num)

This function returns an element based on the row_num, column_num arguments passed e.g. if we write formula as:

=INDEX(A1:E5, 2, 3)

Then it will return value in Cell C2.

We use this function to return the match found in Named Range “NameList”.

To find out the name that exists in given sentence we use following construct:

IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)-2)

Here we need to use Excel’s “Evaluate Formula” functionality which shows the steps that Excel takes internally to arrive at the result.

“Evaluate Formula” is accessed by selecting part of the formula and pressing F9 and will show results as below for Cell D3:

  • The NameList array will be searched up like IF(ISNUMBER(FIND({“Alka”;”Aniket”;”Anil”;….;”Vrunda”},$C3,1)),ROW(NameList)-2).

Note: SEARCH function shall be used in place of FIND if we do not need search to be case sensitive.

  • And then it will be searched in sentence cell C3 which will result in array which has numerical results along with #value! errors like IF(ISNUMBER({1;#value!; #value!;….;10; #value!}),ROW(NameList)-2).
  • ISNUMBER is another useful function which handles errors and numerical results (#value!) and creates BOOLEAN results as required by FIND function as IF({TRUE;FALSE;FALSE;….;TRUE; FALSE}),ROW(NameList)-2).
  • The ROW(NameList) function returns results as {3;4;5;6;7;…;11;12} so when we are processing these results which come from if function. So the numerical results will be correlated with following TRUE results: {3, 11}.
  • If we look at our NameList array then they will refer to 3rd and 11th row respectively. But our data has only 10 elements and 3rd element is “Anil” which is incorrect and then 11 will cause #REF error. What you’ll notice is the result is always offset by 2 rows. That is because our data range starts at 3rd row. So we adjust it by subtracting 2 from it. So it refers to correct results: {1, 9}.

I hope by now we have understood how we use above IF construct to get the results. At this point, you might be still wondering why we used SMALL function around IF.

SMALL(If_function, n)

  • We have multiple results and we need to show one result at a time which is done by SMALL(If_formula, COLUMNS($D$3:D3)).

COLUMNS ($D$3:D3) results in 1 in cell D3 i.e. count of columns.

As we copy across $D$2 remains constant and D2 changes.

So in E2 the value becomes 2, in F2 it becomes 3 and so forth.

  • So in cell D3 we get first result of SMALL formula i.e. 1 and therefore it returns “Alka” and in E3 “Vinay” and then there is no third match in F3 so INDEX formula errors out which is then handled by IFERROR function.

Refer the attached workbook and play with it.

 

I hope this formula helps you somewhere someday.

Shrivallabha

 

DOWNLOAD

You can download a copy of the above file and follow along, Download Sample File.

 

OTHER POSTS IN THIS SERIES

The Formula Foerensics Series contains a wealth of useful solutions and information.

You can learn more about how to pull Excel Formulas apart in the following posts: http://chandoo.org/wp/formula-forensics-homepage/

 

THANK-YOU and a CHALLENGE

Firstly, Congratulations to Shrivallabha on taking up the challenge and on your First Post at Chandoo.org.

Thank-you for explaining to us all how this formula, which has appeared a number of times on the Chandoo.org Forums, works.

Your Challenge?

If you have a clever formula and would like to become an author here at Chandoo.org please consider writing it up as Shrivallabha has done above.

You can submit it to Chandoo or Hui.

 

Written by Hui...
Tags: , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

14 Responses to “Formula Forensics No. 34. Extract words from a cell, where they occur in a list of words.”

  1. Elias says:

    Different approach and same results, but different order. However, it doesn’t get mess if you insert rows before NameList range.

    =IFERROR(LOOKUP(0,-FIND(NameList,$C3)/(COUNTIF($C$3:C3,NameList)=0),NameList),"-")

    Regards

  2. Robert says:

    thanks, i used it today (with modification) and works great (haven't really used array formulas much in the past).

    a slight modification, depending on what you're doing, to get around the case where rows above NameList may be added or deleted
    =IFERROR(INDEX(B:B,SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),ROW(NameList)),COLUMNS($D$2:D2))),”-”)
    ("INDEX(NameList" >> "INDEX(B:B" .... "ROW(NameList)-2" >> drop the "-2")

  3. Elias says:

    If order is important here is another option,

    =IFERROR(INDEX(NameList,MATCH(1,ISNUMBER(FIND(NameList,$C3))/(COUNTIF($C$3:C3,NameList)=0),0)),"-")

    Regards

  4. Hamza asghar says:

    Good morning chando Sir, i am Hamza asghar s/o asghar ali i am 11 yrs old & 6th class student , i have my desktop computer but sometime i use my papa's laptop & and his e.mail ID too. i like to understand excel , as like as you hav command on its. can will you teach me ?? i like to know all formulas but easy . i am waiting for ur reply .
    Best Regards

  5. Priyanka Poojari says:

    alternate method could be
    DATA - TEXT TO COLUMNS -DELIMITED - COMMAS. (put symbol ( , )- FINISH

    • shrivallabha says:

      Priyanka,

      Check out the workbook and requirement. Those words are interspersed in the sentence.

      e.g.
      Ram likes to visit Durgapur

      Namelist has Ram, Shyam, Seeta.

  6. Aditya says:

    Nice formula

  7. Ola says:

    Isn't it time for Excel to become simpler.
    The same basic problems has existed for years, giving rise to ever more elaborate work around's.

    Even though they can be fun to make/decipher. Isn't it time to find the missing main building blocks (formulas) that could make Excel simpler in the long run.

    For example =concatenate() that still can not concatenate an array. Isn't Harlan Grove's Aconcat work around (which must be more than 10 years old) a prime example of a problem that has existed for years and still has no simple solution.
    Or am I just getting old.

    • Anon says:

      An ugly modification to the formula which will give you the names in the order they appear in the sentence (in cell c3).

      =IFERROR(INDEX(NameList,MATCH(SMALL(IF(ISNUMBER(FIND(NameList,$C3,1)),FIND(NameList,$C3,1),FALSE),COLUMNS($D$2:D2)),IF(ISNUMBER(FIND(NameList,$C3,1)),FIND(NameList,$C3,1),FALSE),0)),"-")

  8. Haseeb A says:

    Here is another one, which will extract in order as they appear in NameList. Also will ignore DUPLICATE, if there is.

    =IFERROR(INDEX(NameList,MATCH(0,0*FIND(IF(ISNA(MATCH(NameList,$C3:C3,0)),NameList),$C3),0)),"-")

    To compatibility with all versions,

    =LOOKUP("zzz",IF({1,0},"-",INDEX(NameList,MATCH(0,0*FIND(IF(ISNA(MATCH(NameList,$C3:C3,0)),NameList),$C3),0))))

  9. Renan says:

    Hi,
    I am having a slight problem with the formula shown.
    The first time I use, it returns some incorrect values, but if I Press F2 and F9, the value returned is correct.

    Can you help me with that?
    Best Regards

  10. Giuseppe says:

    Hi all, Im trying to follow all your amazing solution(really are great congrats), but Im not able to apply them on my case. I'll try to explain briefly. Have 3 columns Like these below

    1a. City |1b.Data ||| 2. Universities
    Rome | 20% ||| University of Padua blablabla
    Pisa | 5% ||| University of Rome blablabla
    Bologna | 17% ||| University of Palermo Blablabla
    ||| University of Pisa blablabla
    ||| University of Something

    What im trying to do is something like:

    If the first cell of Tab Universities contains one of the cities from table 1.a City, then write in another cell the corresponding value from 1b.Data. that corresponds to the found city.

    Do u may have some suggestions?

    Thank u all for your time

    G

  11. Jenny says:

    Just loved the sample file tutorial! With its step-by-step instructions & illustrations I was able to modify the formula to meet my particular needs! Thanks for the assistance you didn't know you were giving.

Leave a Reply