Last week Joyce asked a question on the Chandoo.org, Comment 24.
I’m wondering if there’s a way to count the number of occurrences of words when they’re all in a cell? Like this:
A1: “Windows NT, Networking, Firewalls, Security, TL, Training”
A2: “Networking, Networking, Training, Security, TL, Training”
A3: “Security, TL, Firewalls, Security, Networking, Windows NT”
Joyce
Hui responded with an Array Formula:
=SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)
As the formula is an Array Formula it is entered with Ctrl Shift Enter.
Setup the Problem
Copy the Data Above into Cells A1:A3 or download the example file here: Example File (all Excel versions)
Enter the text string Security into cell C10
And array enter the formula
D10: =SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)
Cell D10 should now display the value 4, which is the number of times the Word Security, appears in the Range A1:A3.
Pull The Formula Apart
Lets take a look inside this and see how it works
We will break this formula apart and look at each section independently and then put the answers back together.
=SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)
In a cell below the data
D13: =LEN(A1:A3) but don’t press Enter, Press F9
Excel displays ={57,56,57}
This is the number of characters in each cell A1:A3
ie: A1 has 57 characters, A2 has 56 characters, A3 has 57 characters,
You can check this manually by typing =Len(A1) into any spare cell
=SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)
In another cell below the data
D15: =LEN(SUBSTITUTE(A1:A3,C10,””)) but don’t press Enter, Press F9
Excel displays ={49,48,41}
What this section does is measure the length of each cell in A1:A3 but only after substituting the word being searched for from C10 with ””, which is a zero length string.
So the second array is shorter than the first Array, by X times the length of the word in C10
=SUM(LEN(A1:A3) – LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)
Next we add up the difference between the two arrays
So you can see we have two arrays of numbers
Array 1 = {57,56,57}
Array 2 = {49,48,41}
If we subtract Array 2 from Array 1
= {57-49, 56-48, 57-41}
= {8, 8, 16}
We can do this in Excel to Check
In Cell D17 enter
=LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””)) and press F9
Excel displays: = {8, 8, 16}
=SUM(LEN(A1:A3) – LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)
The next part is to sum these up
Obviously the sum of 8, 8 & 16 is 32
We can check that
D21: =SUM(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3,C10,””))) and press F9
Excel displays: 32
=SUM(LEN(A1:A3) – LEN(SUBSTITUTE(A1:A3,C10,””)))/LEN(C10)
The final part of this is to divide the sum (32 in this case) by the length of the text in C10 “Security” = 8 Characters
=32 / 8
= 4
Correct – The number of times Security appears in the cells A1:A3 is 4.
OTHER POSTS IN THIS SERIES:
You can learn more about how to pull Excel Formulas apart in the following posts
Formula Forensic 001 – Tarun’s Problem
WHAT FORMULAS WOULD YOU LIKE EXAMINED ?
If you have any formulas you would like explained please feel free to leave a post here or send me an email:
If the formula is already on Chandoo.org or Chandoo.org/Forums or even forbid another web site, simply send the link to the post and a Comment or ID No. number if appropriate.
If sending emails please attach an Excel file with the formula and data
20 Responses to “Formula Forensics No. 002 – Joyces Question”
Hey Chandoo
brilliant article !!!
including the relevant notes
THANKS !
regards Stef@n
Is there a reason to substitute the target word with "", rather than extending the target word by one letter (an X added to the end). In this example, instead of replacing the word 'Security' with "", why not replace it with "SecurityX", or SecuXrity? This would remove the need for the division - since the change in length triggered when the target word is found is constant (1), rather than dependent on the length of the word. It's only a minor point, but I'm wondering if I am missing something, and that there is an advantage to removing the trigger word altogether in the substitution, rather than just substituting in an additional letter?
(You would have to do the subtraction the other way round, since the new strings have got longer, and you want a positive sum rather than a negative sum).
Hope that makes sense,
Jon
Awesome! Seems to make a lot of sense!
Hui,
I always stop by to learn some new tricks and understand the logics behind them, but this one... man! that was great! Opened my mind totally! congrats! keep teaching us!
Surely text to columns woul be easier for this? and use "," as the symbol for the column. Then you could use the real power of excel?
@Jon, Rich
There are, as with nearly everything in Excel, several methods to tackle this particular problem. Including Additive (as Jon suggested), Subtractive (as I used) and Splitting (as Rich suggested) as well as a few variations on these that I can think of.
.
However, the purpose of this series of posts is to document how formulas work, so that people can see how the problem has been presented and thought through to a solution.
I've chosen this technique as if people can see the problem and how it has been solved and then understand how the solution works in Excel I think people will learn these techniques quicker.
I would love to see the use of Luke's handy-dandy "ROWS" subtraction. He's done this on the forum numerous times, and, it still sorta confuses me. He seems to use it when a user needs to find the NEXT occurence of data in a list. Maybe use that as an example?
Thanks for the awesme post. So far I love this series!
sweet!
If someone could please explain, if the formula only "counts" characters, then how does it know the difference or even recognize what word your are seaching for instead of counting all words with 8 characters and giving you a count of occurances for that particular word vs all 8 character words?
@Greg
.
The formula is counting characters but it is counting the Total of All Charcters in all the cells in the range.
.
The existing 3 cells A1:A3 have a total of 170 Characters (57 + 56 + 57)
.
The Formula then replaces the Choosen Word "Security" with "" (Zero Characters) and recounts.
The modified 3 cells A1:A3 have a total of 138 Characters (49 + 48 + 41)
The difference is 32 Characters
Now the chosen word "Security" had 8 characters
and so it must have been in the original range 4 times
.
I Hope that helps
Hui-
I absolutely love this explanation for Greg. AWESOME
Thx!!!
3G
Your Awesome!, thanks
[...] Formula Forensic 002 – Joyce’s Question [...]
[...] Formula Forensics 002 – Joyce’s Question [...]
Please be advised that the equations shown in the article contain a double quote using a Unicode Right Double Quotation Mark (U+201D) and a Quotation Mark (U+0022) instead of two Quotation Marks. In my copy of Excel 2007 (I don't know if this is effected by internal settings), if you cut-and-paste the equations into your sheet, you get a "The formula you typed contains an error" message. [FYI, if you use Left (U+201C) and Right (U+201D) Double Quotation Marks you get a "Name?" error]
Thank you for your excellent article. I have learned more from this site than any other.
@Eef
Thanx for pointing that out
I have fixed those odd " 's in the post
Excellent article!! My only comment would be to add UPPER (or LOWER) to the formula so it is no longer case sensitive ...
=SUM(LEN(A2:A4)-LEN(SUBSTITUTE(UPPER(A2:A4),UPPER(C7),"")))/LEN(C7) Thanks!!
@Shair
Thanx for comments
Great idea
Hi,
we can use sumproduct, rather than using Ctrl Shift Enter.
=SUMPRODUCT(LEN(A1:A3)-(LEN(SUBSTITUTE(A1:A3,C10,""))))/LEN(C10)
I'm wondering if there’s a way to count the number of occurrences of words when they’re all in a cell? Like this:
A1: “Windows NT, Networking, Firewalls, Security, TL, Training”
A2: “Networking, Networking, Training, Security, TL, Training”
A3: “Security, TL, Firewalls, Security, Networking, Windows NT”
Joyce
ans=SUMPRODUCT(--ISNUMBER(FIND("Security",XEH2:XEH3)))