Few days back, I ran my first ever webinar, on a topic called, “How to be a BETTER Analyst?” (here is the replay link, in case you missed it). It was a huge success. More than 1,100 people attended the live webinar and hundreds more watched the replay. As part of the webinar, we had interactive Q&A. Viewers posted their questions and I replied to as many of them as I can.
After the webinar, I wanted to make sure I covered all the questions. So I downloaded the chat history. There were more than 700 messages in it. And I am not in the mood to read line by line to find-out the questions. A good portion of chat messages were not questions but stuff like ‘hello everyone, I am from Idaho’, ‘Wow, Chandoo has beard!”, “Enjoying a beer in Belgium while watching webinar” etc. So I wanted a quick way to flag the messages as question or not.
I did what any sensible Excel analyst would do.
I made myself a hot cup of coffee, started playing games on my iPhone while sipping it.
Of course after a cup of coffee and a bout of Candy Crush, I wrote simple Excel formula to find-out if the text in a cell is question or not. Let me share the formula & logic with you.
Let’s take a look at the data
This is how the downloaded chat history looked like when imported to Excel. The column on right is where we need to write formula to find out if the comment is a question or not.
For the sake of simplicity, assume this data is in column B, starting with cell B5.
The logic for identifying questions
In real life, finding if the other person is asking a question or just saying something can be tricky. For example, last evening my wife said, “Shall we go shopping?” and I assumed it was a question and said “no”. Apparently, it wasn’t a question. You can guess the rest.
Unlike real life, in Excel, we can come up with good enough approximation to nail down questions.
For example, if a cell contains any of the below words, we can say it is a question.
What, why, how, who, when, where, is it, can I, can you, which, is this, are you, can we, are we, am I
This is a pretty good way to separate questions from non-questions.
Let’s assume all the question words are maintained in a named range called q.words
Writing the formula
So here is the formula to check if a cell contains question or not.
=SUMPRODUCT(COUNTIFS(B5,"*"&q.words&"*"))>0
How does this formula work?
Remember, B5 is the cell in question (no pun).
We need to see anywhere in B5, one of the question words occur.
This is where COUNTIFS formula helps. It can count how many times a value has occurred in a range.
In our case, if B5 contains any of the question words. Note that B5 can contain other text too (apart from question words).
The formula COUNTIFS(B5,"*"&q.words&"*")
will return an array of size 15 (as q.words contains 15 question words).
Let’s assume B5 has the text – “Why didn’t you take your wife to shopping?”
So, our COUNTIFS(B5...)
will return the array {0;1;0;0;0;0;0;0;0;0;0;0;0;0;0}
The second item is 1 because second question word is Why.
If B5 has this text – “How I wish I took my wife to shopping. Can I take her now?”
COUNTIFS(..)
will return this array {0;0;1;0;0;0;0;1;0;0;0;0;0;0;0}
because it found the question words How and Can I.
But we don’t want the array…
You are right. We don’t need the array of 15 elements. We just want to know if any of the questions are present in the B5 cell.
So, we pass this array to SUMPRODUCT, which sums up all the numbers and tells us single value.
We then check if this value is >0 or not.
So there you have it. A formula to find out if a cell has question or not.
A question for you…
Do you conduct text analysis using Excel? What techniques do you use? Please share your approach & formulas in the comments.
Bonus material for text analysis using Excel
If you deal with lots of text data, you will find below resources very useful.
- Finding patterns in Text – case study problem
- Extracting file name from full path
- Analyzing search keywords & finding word frequency
- Analyzing 20,000 comments – Case study
- Sentiment analysis of text using Excel
- More on text processing & analysis using Excel
Now if you excuse me, I need to take my wife for shopping. 🙂
14 Responses to “How to find out if a text contains question? [Excel formulas]”
Can this be used to create an exclusion list?
Anything within this list comes back as TRUE so we know to exclude?
I have codes (w10, w11, w12) that I want to exclude; but if I enter W19 it comes back as TRUE when it should be FALSE
any help with this or what the limitations are with this.
Great post and has sparked a few ideas !
Hi Chandoo,
I got the first "*", what's the second "*" about (towards the end)? Please explain about both the "*" actually.
The '*' is a wildcard in Excel text searches. It will match with any text.
By having the stars around q.words, you are asking Excel to look for any text surrounding the words in the range.
If you only used the first star, you would only get 'TRUE' values if the question /ended/ with the words from q.words. Only using the second one would only get you TRUE values if the question /started/ with those words.
Thanks a ton xwingace. I almost forgot - wilcard characters.
Hi Chandoo,
It's great to see so you posting so often! Please keep them coming...
In terms of this exercise, is it oversimplifying to simply create a formula that searches for question marks?
Also, as "Question Words" can often be used even when a question isn't being asked (e.g. "That's what I was thinking." or "This reminds me of where we got married."), any tips on how to potentially account for those instances?
Keep up the great work!
One could also just look for a question mark in each cell. i.e. =IF(ISNUMBER(FIND("?",A1)),"1","") Then conditional format the cell red text if the test cell = 1. Questions can be worded many different ways (i.e. "Would you...", "Can one..." etc.).
Brad, you can use shortest formula DIRECTLY as conditional format rule: =ISNUMBER(FIND(“?”,A1))
That's all, no need of test cell.
Shouldn't you also look for "?" in addition to question words? Since that question itself would not be caught by your question words filter.
Great post though, I didn't know about this ability to search for multiple items in a countifs
What? No-one asked "Chandoo, will you marry me?" Un freakin believable.
How to convert numbers into text format. Like there are numbers in one column such as 100, 200, & so on & in the next colum what formula should be inputted to convert 100 into Hundred or 102 as One hundred & two and so on. Please help.
1. Open MS Excel Workbook
2. Click Alt + F11
Then, Microsoft Visual Basic Window opens
3. Click Alt + I, M (or Click Module from Insert Menu Bar)
Then, Module Window opens
4. Copy the below Code (Short Cut: Ctrl + C (or) Alt + E, C (or) Right Click, C (or) Ctrl + Insert)
'-----------
Option Explicit
'Main Function
Function NumWords(ByVal MyNumber)
Dim Rupees, Paises, Temp
Dim DecimalPlace, Count
ReDim Place(9) As String
Place(2) = " Thousand "
Place(3) = " Million "
Place(4) = " Billion "
Place(5) = " Trillion "
' String representation of amount.
MyNumber = Trim(Str(MyNumber))
' Position of decimal place 0 if none.
DecimalPlace = InStr(MyNumber, ".")
' Convert Paises and set MyNumber to Rupee amount.
If DecimalPlace > 0 Then
Paises = GetTens(Left(Mid(MyNumber, DecimalPlace + 1) & _
"00", 2))
MyNumber = Trim(Left(MyNumber, DecimalPlace - 1))
End If
Count = 1
Do While MyNumber ""
Temp = GetHundreds(Right(MyNumber, 3))
If Temp "" Then Rupees = Temp & Place(Count) & Rupees
If Len(MyNumber) > 3 Then
MyNumber = Left(MyNumber, Len(MyNumber) - 3)
Else
MyNumber = ""
End If
Count = Count + 1
Loop
Select Case Rupees
Case ""
Rupees = "No Rupees"
Case "One"
Rupees = "One Rupee"
Case Else
Rupees = Rupees & " Rupees"
End Select
Select Case Paises
Case ""
Paises = " and No Paises"
Case "One"
Paises = " and One Paise"
Case Else
Paises = " and " & Paises & " Paises"
End Select
NumWords = Rupees & Paises
End Function
' Converts a number from 100-999 into text
Function GetHundreds(ByVal MyNumber)
Dim Result As String
If Val(MyNumber) = 0 Then Exit Function
MyNumber = Right("000" & MyNumber, 3)
' Convert the hundreds place.
If Mid(MyNumber, 1, 1) "0" Then
Result = GetDigit(Mid(MyNumber, 1, 1)) & " Hundred "
End If
' Convert the tens and ones place.
If Mid(MyNumber, 2, 1) "0" Then
Result = Result & GetTens(Mid(MyNumber, 2))
Else
Result = Result & GetDigit(Mid(MyNumber, 3))
End If
GetHundreds = Result
End Function
' Converts a number from 10 to 99 into text.
Function GetTens(TensText)
Dim Result As String
Result = "" ' Null out the temporary function value.
If Val(Left(TensText, 1)) = 1 Then ' If value between 10-19...
Select Case Val(TensText)
Case 10: Result = "Ten"
Case 11: Result = "Eleven"
Case 12: Result = "Twelve"
Case 13: Result = "Thirteen"
Case 14: Result = "Fourteen"
Case 15: Result = "Fifteen"
Case 16: Result = "Sixteen"
Case 17: Result = "Seventeen"
Case 18: Result = "Eighteen"
Case 19: Result = "Nineteen"
Case Else
End Select
Else ' If value between 20-99...
Select Case Val(Left(TensText, 1))
Case 2: Result = "Twenty "
Case 3: Result = "Thirty "
Case 4: Result = "Forty "
Case 5: Result = "Fifty "
Case 6: Result = "Sixty "
Case 7: Result = "Seventy "
Case 8: Result = "Eighty "
Case 9: Result = "Ninety "
Case Else
End Select
Result = Result & GetDigit _
(Right(TensText, 1)) ' Retrieve ones place.
End If
GetTens = Result
End Function
' Converts a number from 1 to 9 into text.
Function GetDigit(Digit)
Select Case Val(Digit)
Case 1: GetDigit = "One"
Case 2: GetDigit = "Two"
Case 3: GetDigit = "Three"
Case 4: GetDigit = "Four"
Case 5: GetDigit = "Five"
Case 6: GetDigit = "Six"
Case 7: GetDigit = "Seven"
Case 8: GetDigit = "Eight"
Case 9: GetDigit = "Nine"
Case Else: GetDigit = ""
End Select
End Function
'----------
5. Paste into the Module Window (Short Cut: Ctrl + V (or) Alt + E, V (or) Right Click, V)
6. Close Microsoft Visual Basic Window
Why SUMPRODUCT and not just SUM?
You can use either. SUM would require entering the formulas as array formula (CTRL+Shift+Enter). SUMPRODUCT can process arrays by default.
Is there a way to see the array that the COUNTIFS (without the SUMPRODUCT) would return?