• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

COUNTIFS. Counting survey responses that include "< 5" or "> 25" as text.

Peter Bartholomew

Well-Known Member
I was recently trying to analyse the results of a survey in which the questions included such as
"What is the typical life of critical spreadsheets within your organisation?"

The table on the left and the option list to the right are returned to the sheet from PowerQuery (the question number being an input parameter) but then I used formulas to return the count of each reply. For some questions the totals did not tally and it turned out that "< 5" was being treated as a numeric criterion rather than a text equality condition. Are there any standard ways of making the COUNTIFS function treat the string as a literal?

69525

p.s. I finished up by editing the data to replace "> 20" by "More than 20 yrs" but I would have preferred to avoid touching the raw data.
 

Attachments

Peter Bartholomew

Well-Known Member
@AliGW
Depending on the question selected, the user may have from 2 options up to in the region of 12. I found it convenient to have the result as a dynamic array rather than a filled ranges so that the various forms of analysis (e.g. breaking the responses down by demographics as well as performing simple counts, in case there was value in such analysis) did not require manual intervention when moving from question to question.

Example result with demographics.
Most supply chain companies would use a critical spreadsheet for 5-10 yrs.
This increases to 10-20 yrs for OEMs.
For software vendors the typical time is less, 1-5 yrs.

That put paid to the idea that spreadsheets were simply temporary, ad-hoc solutions that do not warrant lifecycle management effort.
 

AliGW

Active Member
Sorry - that didn't answer my question. The formula in Count2 seems to be counting correctly, so why not just use that?

You might want to try an escape character - something like this (untested on your data), which I often use when counting A* grades:

SUBSTITUTE(value,"*","~*")
 

Peter Bartholomew

Well-Known Member
I have found another solution, namely to replace "<" and ">" by left and right arrowheads, "˂" and "˃", Unicode characters 02C2 and 02C3.
Still open for ideas though.
 

Peter Bartholomew

Well-Known Member
@AliGW
The intersection operator "@" means that the 2nd formula does not spill.

If the formula were part of a Table rather than a separate calculation the automatic fill down could be used in place of an array I guess. Provided it survives the PQ refresh that is.
 

AliGW

Active Member
I have lots that I update daily with columns to their right (part of the table) containing formulae - works fine. As a precaution, right-click the table, choose external data properties, then untick the 'Preserve column sort/filter/layout' option.
 

GraH - Guido

Well-Known Member
Peter friend, why use formula when the table is already generated by PQ? Make a reference to that table, group by "Options" and Count Rows.
 

Peter Bartholomew

Well-Known Member
@GraH - Guido
I spend ages agonising over what was the best approach to the data analysis. The options I considered were
1. Stay within my comfort zone and base the entire analysis on dynamic array formulas
2. To use PQ for the initial filtering and transforming and then use formulas to create multiple chart types
3. To use PQ to load the data and Power Pivot for the charting. That takes me deep into unknown territory.
6955669557
The problem was that I can only determine the best chart types in hindsight when I see the trends in the results. This might be best exposed by analysing the result by the industry sector or by organisation type? Should all the options be shown (including don't know) and how should they be sequenced. I have always struggled to control pivot charts so I elected to follow a hybrid strategy.

Perhaps I should have consulted before committing to an approach; who knows?
 

AliGW

Active Member
If you don't include "don't know", even though it was a possible answer, then aren't you misrepresenting the data???
 

Peter Bartholomew

Well-Known Member
I plan to present the raw data in as compact a form as possible within an appendix to the report. As for the conclusions I draw within the body of the report, it is simply a case of expressing the findings accurately. For example, "of the organisations that have a policy on spreadsheet usage, 8 times as many were seeking to increase spreadsheet usage than decrease it". The statement is correct though, in that case it should probably be qualified by the statement that 2/3 do not have any policy regarding usage.
The problem with "don't know" is where to put it, both less than zero and above 25 distort one's perception of the curves.
 

GraH - Guido

Well-Known Member
I spend ages agonising over what was the best approach to the data analysis. The options I considered were
1. Stay within my comfort zone and base the entire analysis on dynamic array formulas
2. To use PQ for the initial filtering and transforming and then use formulas to create multiple chart types
3. To use PQ to load the data and Power Pivot for the charting. That takes me deep into unknown territory.
...
I have always struggled to control pivot charts so I elected to follow a hybrid strategy.

Perhaps I should have consulted before committing to an approach; who knows?
I hear you. I rarely do not use ordinary charts too.

It somehow surprises me you say PowerPivot is unknown territory. I reckon, since you are a master in structured references, you should rapidly feel much comfort with it and DAX. But it is an investment: it takes time to learn and practice, dixit Marco Russo/Alberto Ferrari. I'll second that a zillion times. Never could I get passed the "basics". I'm in total awe with the @Chihiro's and @John Jairo V's of this world. It's intimidating ;-).
 

Chihiro

Excel Ninja
Since you are doing data discovery along with analysis, then finally reporting. I'd recommend use of PQ along with DAX.

PQ for any necessary data transformation and clean up. Then DAX for quick measures.
Slicers to control segmentation. I.E. slice and dice data to find interesting data story.

Charting, will depend on need. But PivotChart is inflexible in many cases, due to how object layers need to be connected, in order to update with pivot table change (slicer, filter, expand/collapse etc).

So, if the chart does not need to be interactive... you can just copy pivot as value table or use PQ to generate data source for chart.

DAX is very simple in many ways, but you need departure from traditional Excel formula. Which makes it hard for many to transition. It took me a while too. But DAX is transferrable across multiple MS BI platform (SSAS, PowerBI, Azure Analysis Services, Power Pivot) and is very useful skill to have.
And with appropriate data model structure (Star Schema, Fact Constellation Schema), DAX is often very simple to construct.

It's when you force DAX on data model that isn't structured for analysis, it gets complex. For an example, consider fact table with multiple date columns. If you either flatten data table, or use individual dimension for each of date column DAX is pretty simple. But if not, you need complex DAX that accounts for different relationship for each of the column (i.e. USERRELATIONSHIP).
Ex:
https://chandoo.org/forum/threads/power-pivot-measure-with-multiple-date-criteria-identical-year.41046/
 

Peter Bartholomew

Well-Known Member
@GraH - Guido , @Chihiro
Thank you each for your input. I guess your recommendations serve to extend my 'to do' list of learning opportunities somewhat. I have read large sections of an ebook on Power Pivot and DAX but reading about 'measures' etc. is not the same as hands-on experience. I suspect I already have the making of a Star Schema in that the demographics (industry sector and positioning within the supply chain) are linked to the respondent ID whereas full-text for questions is linked by an inserted question number.

I think it is only human to believe that the most natural way of addressing a problem is to repeat the methods one always uses. For me, that means Names and Array formulas. I gave up the use of direct cell referencing and non-array formulas about 8 years ago after a disagreement with the proponents of the financial modelling FAST standard focussed my mind. Initially it was experimental, and led me into defining most formulas as named formulas to avoid implicit intersection. This led Mike Girvin to comment:

"As for your COUNTIFS, I have never, in almost 3 decades, seen spreadsheets like yours, with ALL the calculations entered as arrays!!!
Now I see why your are so excited by the new Excel Calculation Engine
".

For the survey analysis, I did pull myself away to treat the task as primarily one of data management but the temptation to revert to the familiar was always strong. This tendency is reinforced by the fact that I have access to Charles Williams's FastExcel SpeedTools, a library of functions that outperform the native function for speed as well as functionality.

I have ACCUMULATE to create a row showing the question number, based on irregularly spaced and lengthy text questions. I can use FILTER to display results for a selected question. I can use Charles's UNPIVOT function to normalise the data table before using COUNTIFS to aggregate and create the equivalent of pivot tables. String manipulation is, of course, a strength of PQ but regular expression functions like rgx.MID and rgx.SUBSTITUTE are also powerful.

Note: I demonstrated regular expressions in a YouTube video I recorded
Just trying to put a video together increased my respect for those (including this site) who manage to put out polished, professional material.

The common feature of these methods is that they look nothing like traditional spreadsheet development; nor is the thought process that leads to a solution design similar to traditional approaches. In fact, it may be that DAX and array methods are closer to one another than either is to the spirit of traditional spreadsheet design.
 
Top