A few weeks ago Gewilson asked on the Chandoo.org Forums, “Can I simplify my formula? “
=176 – (SUMIF($B10:$AF10,”PD”,$B11:$AF11) +SUMIF($B10:$AF10,”FA”,$B11:$AF11) +SUMIF($B10:$AF10,”PS”,$B11:$AF11) +SUMIF($B10:$AF10,”PN”,$B11:$AF11) +SUMIF($B10:$AF10,”F1″,$B11:$AF11) +SUMIF($B10:$AF10,”P1″,$B11:$AF11) +SUMIF($B10:$AF10,”F7″,$B11:$AF11))
SirJB7 responded with a nice Sumproduct solution:
=176 – SUMPRODUCT(((B$10:AF$10)=({“PD”;”FA”;”PS”;”PN”;”F1″;”F7″})) *(B$11:AF$11))
So Today we will pull this apart to see what inside, I think what we find may surprise you.
SirJB7’s Formula
=176 – SUMPRODUCT(((B$10:AF$10)=({“PD”;”FA”;”PS”;”PN”;”F1″;”F7″})) *(B$11:AF$11))
To Simplify things I am going to use a Truncated set of data and adjust the formula accordingly
We will examine:
=176 – SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))
This problem has a smaller Range B10:I10 instead of B10:AF10
as well as 2 less possible solutions {“PD”;”FA”;”PS”;”PN”} instead of {“PD”;”FA”;”PS”;”PN”;”F1″;”F7″}
The reason for this will soon become evident.
As usual you can download a Sample File to follow along with. Download Here.
Lets go:
=176 – SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))
We can see above that the formula is subtracting the result of a Sumproduct from a Fixed Number 176. So we really only need to focus on the Sumproduct part of the formula.
As we saw In Formula Forensics 007 – Sumproduct, Sumproduct adds up the products of the constituent arrays.
In this case
SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”;})) *(B$11:I$11))
Has only 1 constituent array. The array does consist of 2 components
SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))
These is a Logic component ((B$10:AF$10)=({“PD”;”FA”;”PS”;”PN”}))
and a Numerical Component (B$11:I$11)
Which are then multiplied together.
Looking at the Logical Component first
((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”}))
The formula is checking the Range B10:I10 against an Array of possible solutions {“PD”;”FA”;”PS”;”PN”}
That is, it is checking each value in our list {“PD”;”FA”;”PS”;”PN”}, against each cell in the range B10:I10.
If we type the above equation=((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) into a spare cell C14, and press F9
Excel returns ={FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}
What the …
If we look closely at the above array we will see that it contains a lot of True/Falses separated by ,’s and a few ;’s
={FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}
Specifically there are 4 blocks of 8 True/Falses separated by ,’s, each block is separated by a ;
In Total 4 x 8 = 32 Values
What this is, is an array representing the multiplication of the 8 cells in the range B10:I10 with each element of the possible solution array
Each row of the Array is separated from the next by a ;
Each element in each row is separated by a ,
This is best displayed like:
You can see why I simplified the size of the original problem.
So we have an Array of True/Falses ={FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}
Which is now multiplied by the next component of the Sumproduct
({FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}) *(B$11:I$11)
In a spare cell, say C23 enter
=({FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}) *(B$11:I$11) and press F9
Excel returns
={0,0,0,0,10,0,0,0;0,10,0,0,0,0,0,0;0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0}
Note that this array has the same 8 column x 4 Row layout as above, except that all the True have been replaced by the values in the Score cells B11:I11
Sumproduct now kicks in and adds these up
=Sumproduct({0,0,0,0,10,0,0,0;0,10,0,0,0,0,0,0;0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0})
To get 20
Which is subtracted from our original number
=176 – SUMPRODUCT(((B$10:I$10)=({“PD”;”FA”;”PS”;”PN”})) *(B$11:I$11))
= 176 – Sumproduct({0,0,0,0,10,0,0,0;0,10,0,0,0,0,0,0;0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0})
= 176 – 20
= 156
Download
You can download a copy of the above file and follow along, Download Here.
Other Posts In This Series
You can learn more about how to pull Excel Formulas apart in the following posts
We Need Your Help
I have received a few more ideas since last week and these will feature in coming weeks.
I do need more ideas though and so I need your help.
If you have a neat formula that you would like to share and explain, try putting pen to paper and draft up a Post like above or;
If you have a formula that you would like explained but don’t want to write a post also send it in to Chandoo or Hui.















12 Responses to “Analyzing Search Keywords using Excel : Array Formulas in Real Life”
Very interesting Chandoo, as always. Personally I find endless uses for formulae such as {=sum(if(B$2:B$5=$A2,$C$2$C$5))}, just the flexibility in absolute and relative relative referencing and multiple conditions gives it the edge over dsum and others methods.
I've added to my blog a piece on SQL in VBA that I think might be of interest to you http://aviatormonkey.wordpress.com/2009/02/10/lesson-one-sql-in-vba/ . It's a bit techie, but I think you might like it.
Keep up the good work, aviatormonkey
Hi Chandoo,
You might find this coded solution I posted on a forum interesting.
http://www.excelforum.com/excel-programming/680810-create-tag-cloud-in-vba-possible.html
[...] under certain circumstances. One of the tips involved arranging search keywords in excel using Array Forumlas. Basically, if you need to know how frequent a word or group of keywords appear, you can use this [...]
@Aviatormonkey: Thanks for sharing the url. I found it a bit technical.. but very interesting.
@Andy: Looks like Jarad, the person who emailed me this problem has posted the same in excelforum too. Very good solution btw...
Realy great article
"You can take this basic model and extend it to include parameters like number of searches each key phrase has, how long the users stay on the site etc. to enhance the way tag cloud is generated and colored."
How would you go about doing this? I think it would need some VB
Hi,
I found the usage very interesting, but is giving me hard time because the LENs formula that use ranges are not considering the full range, in other words, the LEN formula is only bringing results from the respective "line" cell.
Using the example, when I place the formula to calculate the frequency for "windows" brings me only 1 result, not 11 as displayed in the example. It seems that the LEN formula using ranges is considering the respective line within the range, not the full range.
Any hint?
@Thiago
You have to enter the formula as an Array Formula
Enter the Formula and press Ctrl+Shift+Enter
Not just Enter
Thank you, Hui! I couldn't work out how this didn't work
is there a limit to the number of lines it can analyse.
Ie i am trying to get this to work on a list of sentances 1500 long.
@Gary
In Excel 2010/2013 Excel is only limited by available memory,
So just give it a go
As always try on a copy of the file first if you have any doubts
Apologies if I am missing something, but coudn't getting frequency be easier with Countif formula. Something like this - COUNTIF(Range with text,"*"&_cell with keyword_&"*")
Apologies if I missed, but what is the Array Formula to:
1. Analyze a list of URL's or a list of word phrases to understand frequency;
2. List in a nearby column from most used words to least used words;
3. Next to the list of words the count of occurrences.