• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Reduce the size of Very Large Excel Data Model

Dokat

Member
Hi,

I created a data model with 295K rows and 27 columns which comes out to be 17 MB when i ran a power pivot table. After adding another table with 227K row to the data model all sudden data size jumps to over 42 mb? Can anyone advise me on how to reduce the size of the data model...

Thanks
 
It really depends on what your doing with it?

Are you saving it as an Excel Binary file ? (*.xlsb file type)

Do you need to keep it all in Excel?
Can it be offloaded to Access or a CSV table and accessed via a Power tool ?

There are techniques for summarising large tables in Excel that are blindingly fast, so don't let the actual file size worry you, it is the speed that is generally an issue
 
So what are you trying to achieve ?

I am trying to update a excel dashboard....source data is in another excel workbook with data model over 184mb. I am trying to see if there is a way to reduce the size of the excel file so when i update the dashboard file stays under 20 mb.

Thanks
 
How often does the data change? Or is it just added to?
Are there any fields that aren't required?
 
Why can't the dashboard be setup to extract the data from the other file using SQL or other linkages?
 
It's a excel workbook with 2 sheets. One sheet has 197K rows and 27 columns, the other one has 266K rows and 20 columns. Total size of the workbook is 176 mb. I have a vba code in the spreadsheet thus i have to save the file in /xlsm format. I have a dashboard (another workbook) that uses this workbook as a source file. When i ran pivot table using power pivot data model dashboard size reduces to 46 MB but it is still very large file to open and use. I am open to any recommendations or suggestions you may have. Thank you
 
Last edited:
A few suggestion.

1. Reduce data size by performing aggregation prior to loading to data model (PowerQuery, ADODB, MS Query etc)
2. Use XLSB format instead of XLSM. It will further reduce size.
3. Check PivotTable Cache and that you don't have unnecessary duplicates
 
Hi,

1) How can i perform aggregation prior to loading the data.

2) XLSB reduces the size of the file. I use external connections to link to database and run pivot table. However when i do that size of the dashboard increases exponentially.

2) How can i check the PivotTable Cache that it doesnt have unnecessary duplicates?

Thanks
 
1) Depends on method and type of aggregation. As well as your source data structure.

You can find some example for PowerQuery below:
Custom sumifs function
http://forum.chandoo.org/threads/powerquiery-to-allocate-numbers-if-stock-is-available.32443/

Using SQL statement in PowerQuery (connecting to MS SQL Server)
  • Aggregation performed on hourly IVR data to daily level and also aggregated on market level (Original data was broken out by city and Phone # called), joined with other lookup/related tables.
  • Data also snipped to relevant date range (Previous month & Current month to date)
  • Similar operation can be done in MS Query/ADODB (slightly different syntax) for workbook/sheets
Code:
Select
    t1.reportdate as rDate,
    t1.Product,
    (Case When t1.Product='Product1-E' or t1.Product='Product1-F' Then
    'Product1' When t1.Product='Product2-E' or t1.Product='Product2-F' Then
    'Product2' Else t1.Product End) As pType,
    t3.regionName As Region,
    t2.marketName,
    SUM(t1.SalesCalls) As SalesCalls,
    SUM(t1.InfoCalls) As InfoCalls,
    SUM(t1.CallDuration) As CallDuration,
    SUM(t1.HoldDuration) As HoldDuration,
    SUM(t1.NewAni) As NewAni,
    SUM(t1.ExistingAni) As ExistingAni,
    SUM(t1.amount) As SalesAmount,
    SUM(t1.Support) As SupportCalls
From
    View_IVR_Sales_Data as t1
    Left Outer Join View_MarketCodeMap as t2
    ON t1.cityCode = t2.cityCode
    Left Outer Join MarketRegions as t3
    On t1.cityCode = t3.cityCode
Where
    t1.reportdate >= DATEADD(month, -1, Dateadd(month, datediff(month, 0, Getdate()),0))
Group by
    t1.reportdate as rDate,
    t1.Product,
    (Case When t1.Product='Product1-E' or t1.Product='Product1-F' Then
    'Product1' When t1.Product='Product2-E' or t1.Product='Product2-F' Then
    'Product2' Else t1.Product End) As pType,
    t3.regionName As Region,
    t2.marketName

2) Your set up isn't optimal then. Likely duplicating pivot caches, unnecessary formatting etc etc. Also consider splitting dashboard workbook into multiple workbooks. Organized by related data category etc. Any case, use XLSB over other formats for better performance.

3) See link.
http://ramblings.mcpher.com/Home/excelquirks/snippets/pivotcache
 
Back
Top