• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Remove not used values, in dimensions

Lolo

Member
Hello,

Common problem I would want an advice...

let says we have a dimension table 'DimCity' and a table 'Fact',
linked together by cityID.

* City contains 100 000 cities (city id, city name)
* Fact only use 2 000 cities.

When I put all of that in the model, and create a pivot table, and a slicer on city, I want to only see my 2 000 cities, not the 100 000 in the slicer...

How could I reduce the dimension data to only the relevant used cities ?
I know I can use power query to transform the' DimCity', but in most complex cases, it is an extra effort that could be major...
To compare, I manage easily to do that in Qlikview with just 1 instruction in the load script.
So I'm wondering if there is any trick I could do in the model or DAX or whereever, to reduce the dimension based on values really used in the fact table.

Thank you !
 
OK, thank you. Was aware of that indeed, but it is just a workaround according to me.
Indeed, if you use the same field filter but on the pivot table, the list will be the 100 000. :(

I would really want to reduce the list in the model, not in the interface.

So solution would only be using power query ?
 
You should really do this sort of transformation in PowerQuery. All the necessary transformation should be done prior to loading it to model.

It's quite simple really.
Do List.Distinct on Fact table column in blank query. Join dimension to it and expand. Leave original dimension table out of data model (connection only) to avoid unnecessarily loading 100k records, where 2k is sufficient.

You could do it in model as well... though not in Excel's model or SSAS tabular. Calculated tables are only available to PowerBI as far as I'm aware. But then, you'd be adding entire 100k records to model before it's filtered. So, I don't see any point of doing this in model.

Basically you add calculated column to Dimension table (don't add relationship yet)... Using iterator function (SUMX) to iterate over each row of Table1 to check if it exists in dimension table.

=IF(SUMX(FactTable, FIND(FactTable[Key],DimTable[Key],,0))>0,1,0)

Then use following to create new table and base relationship between Fact and this new table.
=FILTER(Table2,Table2[Flag]=1)


Alternately, if the data is brought in using query. Just filter out unnecessary Dimensions using List.Distinct(FactTable[Key]) to construct SQL statement (i.e. Where column in (list))
 
@Lolo... another option is if you are using measures in the pivot table, then you can create the measure such a way that it would return BLANK() if the count / sum etc is 0 (ie there is no value for the city). Then when you put this measure in pivot (and nothing else other than city), pivot should automatically exclude the blank() cities.

As @Chihiro suggested, if possible, get rid of these cities in PQ later. But if you cannot do that, I do not recommend SUMXing on the fact table as this is an expensive operation.

all the best.
 
Thank you for for both of you. I will use power query (and I already do that, but it is just that in Qlikview/QlikSense, you can do it more efficiently (according to me), by joining tables and telling during the join, to remove the unused value ('LEFT KEEP JOIN' instruction). So I'm wondering if there was some kind of equivalent in power query/power pivot)
 
Hmm? Do left join and drop distinct column from fact table. It's basically doing the same thing Qlik's Left Keep Join is doing. Just not the built in function.

Or as I've mentioned, use ODBC and sql statement. You can use Where... IN like I mentioned, or use nested select to do inner join and drop distinct column in outer select.

You could write custom function in PQ, but this is quick enough step with single application, that it provides little benefit.

@r2c2 Yes, I agree. Iterator function is useful, but in many cases, there are other more efficient alternatives.
 
Ok, by directly using List.Combine and List.Distinct functions, to get only the relevant data from fact tables, and then joining the result to the full data in the dimension, I'm able to quickly "reduce" the dimenston data :).

However, I'm obliged to load my dimensions AFTER the fact tables, but this is not an issue.

Thank you.
 
Back
Top