• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Need help getting frequency of values in a huge asymmetrical range/ table

sauronbaggins

New Member
Hi guys,

I have tried googling and the things that I know. However, I am not getting the solution. I have an asymmetrical data set of names. It has approx 20000 rows. Column A is never empty for any of the rows. The farthest column that holds a value in it is "HZ" (not all rows have values until HZ, meaning a LOT of the columns are empty. I made it a "Table" and tried to summarize with a pivot table but that doesn't work. What's the best way to find frequency of all the names in such a huge range?

To give you an idea of how the data looks, please see this table.

GameColumn 1Column 2Column 3
Tennis​
Badminton​
Chess​
Golf​
Tennis​
Baseball​
Basketball​
Table Tennis​
Tennis​
Golf
Hockey

The output I want is this:

Game <Heading doesn't really matter; alphabetically ordered doesn't matter, I can sort them later>Count
Tennis3
Golf2
Hockey1
Badminton1
Baseball1
Basketball1
Chess1
Table Tennis1

If there is a way to consolidate/ transpose all the values across this range in a single column, that might also work.
 

Peter Bartholomew

Well-Known Member
I think the answer may be simpler than you expect. Using dynamic arrays
= COUNTIFS(Table, sport)
gives the result. Without dynamic arrays it might be better to use a table for the output list as well, so the name of each game is referenced separately
= COUNTIFS(Table, [@Sport])
and the formula copies down the output table automatically.

Getting the list of distinct sports is harder but you may already have that.
Your idea of consolidation into a single column is also possible but it represents very much more work.
 

sauronbaggins

New Member
I think the answer may be simpler than you expect. Using dynamic arrays
= COUNTIFS(Table, sport)
gives the result. Without dynamic arrays it might be better to use a table for the output list as well, so the name of each game is referenced separately
= COUNTIFS(Table, [@Sport])
and the formula copies down the output table automatically.

Getting the list of distinct sports is harder but you may already have that.
Your idea of consolidation into a single column is also possible but it represents very much more work.
Thanks for your reply, but I do not have the list of all the unique "sport" values. The unique values run in a few thousands, so it is not possible to do it manually. I first need to create a list of unique values and then find the frequency of each of those across the range. Sorry, if that wasn't clear in the first post.
 

Peter Bartholomew

Well-Known Member
If you have a dynamic array version of Excel, I could provide a formula-based solution. If not, I would suggest the use of PowerQuery.

Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
    #"Filtered Rows" = Table.SelectRows(#"Removed Duplicates", each ([Value] <> "")),
    #"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Value", Order.Ascending}})
in
    #"Sorted Rows"
Guido might have some improvements and could possibly even return the count of instances from PQ. Otherwise, extend the PQ output table with the formula from above and it will refresh along with the table.
 
Last edited:

GraH - Guido

Well-Known Member
Hi Peter, sauronbaggins

What about this one?
Code:
let
    Source = Excel.CurrentWorkbook(){[Name="tGames"]}[Content],
    GroupAllRows = Table.Group(Source, {}, {{"Games", each _, type table [Game=text, Column1=text, Column2=text, Column3=text]}}),
    UnpivotAllColumns = Table.AddColumn(GroupAllRows, "GamesUnpivot", each Table.Unpivot([Games],Table.ColumnNames([Games]),"Headers", "Games")),
    RemoveAllTable = Table.RemoveColumns(UnpivotAllColumns,{"Games"}),
    ExpandGamesUnpivot = Table.ExpandTableColumn(RemoveAllTable, "GamesUnpivot", {"Games"}, {"Games"}),
    GroupGames_CountRows = Table.Group(ExpandGamesUnpivot, {"Games"}, {{"Count", each Table.RowCount(_), type number}}),
    SortCountDesc_GamesAsc = Table.Sort(GroupGames_CountRows,{{"Count", Order.Descending}, {"Games", Order.Ascending}})
in
    SortCountDesc_GamesAsc
 

Attachments

p45cal

Well-Known Member
another:
Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    AddedCustom = Table.AddColumn(Source, "x", each Record.ToTable(_)),
    RemovedOtherColumns = Table.SelectColumns(AddedCustom,{"x"}),
    ExpandedCustom = Table.ExpandTableColumn(RemovedOtherColumns, "x", {"Value"}, {"Value"}),
    FilteredRows = Table.SelectRows(ExpandedCustom, each ([Value] <> null)),
    GroupedRows = Table.Group(FilteredRows, {"Value"}, {{"Count", each Table.RowCount(_), type number}}),
    SortedRows = Table.Sort(GroupedRows,{{"Count", Order.Descending}})
in
    SortedRows
although an adaptation of @Peter Bartholomew 's get's there quite slickly:
Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    UnpivotedColumns = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),
    RemovedColumns = Table.RemoveColumns(UnpivotedColumns,{"Attribute"}),
    GroupedRows = Table.Group(RemovedColumns, {"Value"}, {{"Count", each Table.RowCount(_), type number}}),
    SortedRows = Table.Sort(GroupedRows,{{"Count", Order.Descending}, {"Value", Order.Ascending}})
in
    SortedRows
 
Last edited:

GraH - Guido

Well-Known Member
Haha, in my effort to avoid hard coded column names, I forced a step where they are hard coded :rolleyes:. Really like your step Record.ToTable. It seems I never think in records, but only table and lists. So that's a good reminder not to skip the obvious. On larger data it would mean a significant reduction in running time. Good stuff p45cal.
 

sauronbaggins

New Member
Thanks everyone. I tried this solution at another board and it worked perfectly (sharing as it can be helpful to other people in similar situation). Link
 

vletm

Excel Ninja
sauronbaggins
As a new member, You should reread Forum Rules
You have skipped few basic steps.
 
Top