Formula Forensics No. 002 – Joyces Question

Posted on November 7th, 2011 in Excel Howtos , Formula Forensics , Huis , Posts by Hui - 19 comments

Last week Joyce asked a question on the Chandoo.org, Comment 24.

I’m wondering if there’s a way to count the number of occurrences of words when they’re all in a cell? Like this:
A1: “Windows NT, Networking, Firewalls, Security, TL, Training”
A2: “Networking, Networking, Training, Security, TL, Training”
A3: “Security, TL, Firewalls, Security, Networking, Windows NT”

Joyce

Hui responded with an Array Formula:

=SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)

As the formula is an Array Formula it is entered with Ctrl Shift Enter.

 

Setup the Problem

Copy the Data Above into Cells A1:A3 or download the example file here: Example File (all Excel versions)

Enter the text string Security into cell C10

And array enter the formula

D10: =SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)

Cell D10 should now display the value 4, which is the number of times the Word Security, appears in the Range A1:A3.

 

Pull The Formula Apart

Lets take a look inside this and see how it works

We will break this formula apart and look at each section independently and then put the answers back together.

=SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)

In a cell below the data

D13: =LEN(A1:A3) but don’t press Enter, Press F9

Excel displays ={57,56,57}

This is the number of characters in each cell A1:A3

ie: A1 has 57 characters, A2 has 56 characters, A3 has 57 characters,

You can check this manually by typing =Len(A1) into any spare cell

 

=SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)

In another cell below the data

D15: =LEN(SUBSTITUTE(A1:A3,C10,””)) but don’t press Enter, Press F9

Excel displays ={49,48,41}

What this section does is measure the length of each cell in A1:A3 but only after substituting the word being searched for from C10 with ””, which is a zero length string.

So the second array is shorter than the first Array, by X times the length of the word in C10

 

=SUM(LEN(A1:A3) – LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)

Next we add up the difference between the two arrays

So you can see we have two arrays of numbers

Array 1 = {57,56,57}

Array 2 = {49,48,41}

If we subtract Array 2 from Array 1

= {57-49, 56-48, 57-41}

= {8, 8, 16}

We can do this in Excel to Check

In Cell D17 enter

=LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)) and press F9

Excel displays: = {8, 8, 16}

 

=SUM(LEN(A1:A3) – LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)

The next part is to sum these up

Obviously the sum of 8, 8 & 16 is 32

We can check that

D21: =SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””))) and press F9

Excel displays: 32

 

=SUM(LEN(A1:A3) – LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)

The final part of this is to divide the sum (32 in this case) by the length of the text in C10 “Security” = 8 Characters

=32 / 8

= 4

Correct – The number of times Security appears in the cells A1:A3 is 4.

 

OTHER POSTS IN THIS SERIES:

You can learn more about how to pull Excel Formulas apart in the following posts

Formula Forensic 001 – Tarun’s Problem

 

WHAT FORMULAS WOULD YOU LIKE EXAMINED ?

If you have any formulas you would like explained please feel free to leave a post here or send me an email:

If the formula is already on Chandoo.org or Chandoo.org/Forums or even forbid another web site, simply send the link to the post and a Comment or ID No. number if appropriate.

If sending emails please attach an Excel file with the formula and data

 

Written by Hui...
Tags: , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

19 Responses to “Formula Forensics No. 002 – Joyces Question”

  1. Stef@n says:

    Hey Chandoo
    brilliant article !!!
    including the relevant notes
    THANKS !
    regards Stef@n

  2. Jon says:

    Is there a reason to substitute the target word with "", rather than extending the target word by one letter (an X added to the end). In this example, instead of replacing the word 'Security' with "", why not replace it with "SecurityX", or SecuXrity? This would remove the need for the division - since the change in length triggered when the target word is found is constant (1), rather than dependent on the length of the word. It's only a minor point, but I'm wondering if I am missing something, and that there is an advantage to removing the trigger word altogether in the substitution, rather than just substituting in an additional letter?

    (You would have to do the subtraction the other way round, since the new strings have got longer, and you want a positive sum rather than a negative sum).

    Hope that makes sense,

    Jon

  3. Affonso says:

    Hui,
    I always stop by to learn some new tricks and understand the logics behind them, but this one... man! that was great! Opened my mind totally! congrats! keep teaching us!

  4. Rich Scorer says:

    Surely text to columns woul be easier for this? and use "," as the symbol for the column. Then you could use the real power of excel?

  5. Hui... says:

    @Jon, Rich
    There are, as with nearly everything in Excel, several methods to tackle this particular problem. Including Additive (as Jon suggested), Subtractive (as I used) and Splitting (as Rich suggested) as well as a few variations on these that I can think of.
    .
    However, the purpose of this series of posts is to document how formulas work, so that people can see how the problem has been presented and thought through to a solution.
    I've chosen this technique as if people can see the problem and how it has been solved and then understand how the solution works in Excel I think people will learn these techniques quicker.

  6. 3G says:

    I would love to see the use of Luke's handy-dandy "ROWS" subtraction. He's done this on the forum numerous times, and, it still sorta confuses me. He seems to use it when a user needs to find the NEXT occurence of data in a list. Maybe use that as an example?

    Thanks for the awesme post. So far I love this series!

  7. SyedGJ says:

    sweet!

  8. Greg says:

    If someone could please explain, if the formula only "counts" characters, then how does it know the difference or even recognize what word your are seaching for instead of counting all words with 8 characters and giving you a count of occurances for that particular word vs all 8 character words?

  9. Hui... says:

    @Greg
    .
    The formula is counting characters but it is counting the Total of All Charcters in all the cells in the range.
    .
    The existing 3 cells A1:A3 have a total of 170 Characters (57 + 56 + 57)
    .
    The Formula then replaces the Choosen Word "Security" with "" (Zero Characters) and recounts.
    The modified 3 cells A1:A3 have a total of 138 Characters (49 + 48 + 41)

    The difference is 32 Characters
    Now the chosen word "Security" had 8 characters
    and so it must have been in the original range 4 times
    .
    I Hope that helps

  10. 3G says:

    Hui-
    I absolutely love this explanation for Greg. AWESOME

    Thx!!!
    3G

  11. Greg says:

    Your Awesome!, thanks

  12. eef says:

    Please be advised that the equations shown in the article contain a double quote using a Unicode Right Double Quotation Mark (U+201D) and a Quotation Mark (U+0022) instead of two Quotation Marks. In my copy of Excel 2007 (I don't know if this is effected by internal settings), if you cut-and-paste the equations into your sheet, you get a "The formula you typed contains an error" message. [FYI, if you use Left (U+201C) and Right (U+201D) Double Quotation Marks you get a "Name?" error]

    Thank you for your excellent article. I have learned more from this site than any other.

  13. Hui... says:

    @Eef
    Thanx for pointing that out
    I have fixed those odd " 's in the post

  14. Shair says:

    Excellent article!! My only comment would be to add UPPER (or LOWER) to the formula so it is no longer case sensitive ...
    =SUM(LEN(A2:A4)-LEN(SUBSTITUTE(UPPER(A2:A4),UPPER(C7),"")))/LEN(C7) Thanks!!

  15. Hui... says:

    @Shair
    Thanx for comments
    Great idea

  16. Hareesh says:

    Hi,

    we can use sumproduct, rather than using Ctrl Shift Enter.

    =SUMPRODUCT(LEN(A1:A3)-(LEN(SUBSTITUTE(A1:A3,C10,""))))/LEN(C10)

Leave a Reply