How can you analyze 1mn+ rows data – Excel Interview Question – 02

Share

Facebook
Twitter
LinkedIn
How-to handle more than million rows in Excel

As part of our Excel Interview Questions series, today let’s look at another interesting challenge. How-to handle more than million rows in Excel?

You may know that Excel has a physical limit of 1 million rows (well, its 1,048,576 rows). But that doesn’t mean you can’t analyze more than a million rows in Excel.

The trick is to use Data Model.

Excel data model can hold any amount of data

Introduced in Excel 2013, Excel Data Model allows you to store and analyze data without having to look at it all the time. Think of Data Model as a black box where you can store data and Excel can quickly provide answers to you.

Because Data Model is held in your computer memory rather than spreadsheet cells, it doesn’t have one million row limitation. You can store any volume of data in the model. The speed and performance of this just depends on your computer processor and memory.

How-to load large data sets in to Model?

Let’s say you have a large data-set that you want to load in to Excel.

If you don’t have something handy, here is a list of 18 million random numbers, split into 6 columns, 3 million rows.

Step 1 – Connect to your data thru Power Query

Go to Data ribbon and click on “Get Data”. Point to the source where your data is (CSV file / SQL Query / SSAS Cube etc.)

Get data > Get & Transform Data options

Step 2 – Load data to Data Model

In Power Query Editor, do any transformations if needed. Once you are ready to load, click on “Close & Load To..” button.

Close & Load to... options in Power Query

Tell Power Query that you want to make a connection, but load data to model.

Load data to Data Model in Excel

Now, your data model is buzzing with more than million cells.

Step 3 – Analyze the data with Pivot Tables

Go and insert a pivot table (Insert > Pivot Table)

Excel automatically picks Workbook Data Model. You can now see all the fields in your data and analyze by calculating totals / averages etc.

You can also build measures (thru Power Pivot, another powerful feature of Excel) too.

How to view & manage the data model

Once you have a data model setup, you can use,

  • Data > Queries & Connections: to view and adjust connection settings
  • Relationships: to set up and manage relationships between multiple tables in your data model
  • Manage Data Model: to manage the data model using Power Pivot
How to manage Data model in Excel - various options

Alternative answer – Can I not use Excel…

Of course, Excel is not built for analyzing such large volumes of data. So, if possible, you should try to analyze such data with tools like Power BI [What is Power BI?] This gives you more flexibility, processing power and options.

Watch the answer & demo of Excel Data Model

I made a video explaining the interview question, answer and a quick demo of Excel data model with 2 million rows. Check it out below or on my YouTube Channel.

Resources to learn about Excel Data Model

How do you analyze large volumes of data in Excel?

What about you? Do you use the data model option to analyze large volumes of data? What other methods do you rely on? Please post your tips & ideas in the comments section.

Facebook
Twitter
LinkedIn

Share this tip with your colleagues

Excel and Power BI tips - Chandoo.org Newsletter

Get FREE Excel + Power BI Tips

Simple, fun and useful emails, once per week.

Learn & be awesome.

Welcome to Chandoo.org

Thank you so much for visiting. My aim is to make you awesome in Excel & Power BI. I do this by sharing videos, tips, examples and downloads on this website. There are more than 1,000 pages with all things Excel, Power BI, Dashboards & VBA here. Go ahead and spend few minutes to be AWESOME.

Read my storyFREE Excel tips book

Overall I learned a lot and I thought you did a great job of explaining how to do things. This will definitely elevate my reporting in the future.
Rebekah S
Reporting Analyst
Excel formula list - 100+ examples and howto guide for you

From simple to complex, there is a formula for every occasion. Check out the list now.

Calendars, invoices, trackers and much more. All free, fun and fantastic.

Advanced Pivot Table tricks

Power Query, Data model, DAX, Filters, Slicers, Conditional formats and beautiful charts. It's all here.

Still on fence about Power BI? In this getting started guide, learn what is Power BI, how to get it and how to create your first report from scratch.

12 Responses to “Speeding up & Optimizing Excel – Tips for Charting & Formatting [Speedy Spreadsheet Week]”

  1. Greg says:

    Usually when I dump data into my files to update values, the formatting sometimes go to all rows or columns. So what I typically will do is go to the last row and then the last column and use Ctrl + Shift + end and then delete the cells highlighted. this will remove all unknown formats in the worksheet. Also, after you have done this, you won't see the benefit until you save the document. Sometimes I even have to close and reopen. The direct sign that this has improved is the size of the scroll bar and range.

  2. I have some comments on a couple of the points.

    1. Camera objects

    Tip: I use defined names in conjunction with camera tool objects.
    Each camera object gets a name like so:
    CameraItem01
    Referring to: =IF(PicsOn=1,Sheet1!$C$2:$S$5,"")
    By setting the PicsOn name to 1, the camera objects become "live", by setting the PicsOn name to 0, they become static. That improves performance enormously.

    4: Conditional formatting

    Lots of CF rules can slow down your workbook a lot. And it does not show the calc progress a "normal" recalc does on slow workbooks.

    5. Format whole columns/rows

    as far as I know, there is no problem with formatting entire columns/rows performance-wise, on the contrary, Excel is more efficient when you format an entire column than when you format a couple of 100 rows of a column.

    6. Styles.

    Here I wholeheartedly disagree. I say: Use styles. And use them religously.

    I mean: if you have applied a (custom) style and you need to change a small piece of formatting to make that one cell look right, force yourself to create a new style just for that cell. It forces you to really think about your spreadsheet design and try and streamline it. It also makes it much, much easier to change your sheet's appearance later on. See http://www.jkp-ads.com/articles/styles00.asp

    • Chandoo says:

      Very good insights Jan..

      Camera objects: I often use similar technique to turn off images in my dashboards.

      Formats: Thanks for clearing this. Do you think formatting larger ranges has any impact on macro speeds or it does not matter?

      Styles: Thanks for telling us about this. As I mentioned, I am not sure about the styles, but I am under the impressions that excessive use of styles can bloat the file size.

      • @Chandoo:
        If you stick to formatting entire rows/columns I don't expect macro speed is affected. Better: try it!

        If you use styles properly AND as a replacement of ad-hoc cell formatting, I expect you'll see that the file actually is smaller in size.

        This is because the cells now only have a reference to a single style instead of a reference to a custom cell formatting style.

        Many cell formatting combinations get created if you format your cells in an ad-hoc manner, which was responsible for the dreaded "Too many different cell formats" error in Excel 2003 and older. Excel 2007 and 2010 have a higher limit there, but it does slow down your file with many of them.

        Style bloat in my point of view is what you get by copying and pasting a lot from various other files and thus get Normal 1, Normal 1 1, Normal 1 1 1, ... I have seen workbooks with as many as 6000 styles, all caused by copying and pasting from various differently formatted workbooks.

        Excel 2007 and 2010 have fixed a number of issues regarding copying of styles, but for workbooks with a long editing history, the trouble is already in the workbooks.

  3. PremSivakanthan says:

    Cant emphasise the importance of reducing the amount of formatting in a workbook - this has a suprising impact on workbook size. I've always kept to one font, and no more than three colours - this has worked well for me. Keeping things clean and simple should be the motto when designing any type of report/dashboard that is going to be distributed around the organisation.

    You can also save a few MB's by saving as an xlsb file.

  4. Ron says:

    Has anyone else mentioned that only the first item in the "more ..." section is hyperlinked.

    Prem, have you confirmed by trial that XLSB file size is smaller than same XLSX file? Sorry, I just tried it with a small, simple XLSM file. I was surprised to see you are correct. File went from 40kb to 37kb. I thought that the compression of the new file would make the new file smaller.

    • Hui... says:

      @Ron
      All Excel files have a minimum overhead that they have to include which is around 8KB, just to store a simple number or letter.
      So with a small file of 40KB you will not see a huge improvement in file size
      With files greater than 10MB you will see large improvements in size.
      The compression gained also depends on what the contents of the file include. That is straight numbers, text and formulas can be greatly compressed whereas files that contain a lot of objects especially pictures gain very little from using *.xlsb files.

    • Chandoo says:

      @Ron.. the other articles are yet to be published. All the links will be updated by Tuesday (27th March).

  5. Mil says:

    Hi,

    I have a need for x,y scatter chart to have arround 30 data series.
    like this:
    http://i65.tinypic.com/jra8lc.jpg
    Also I have multiple of such charts in one excel file.

    Is there any way to make excel faster, because it is irritatingly slow?
    (though my PC config. is quite on the level)

    Thanks in advance!!!

    • Hui... says:

      @Mil
      30 series won't be the issue
      It is the number of points in the series
      Also remove all fancy modifications, like shadows, fancy fills etc

      I'd suggest asking the question in the Chandoo.org Forums http://forum.chandoo.org/
      Attach a sample file with an example of what you are after

      • Mil says:

        @Hui

        I've already removed all fancy mod. The problem is there are also a lot of data points in one series.
        Thanks for the advice!

        • Hui... says:

          @Mil

          Do you really need every data point ?

          Where is the chart being presented Screen or Report

          On a screen you are unlikely to use more than 800 pixels for the chart area
          So using any more than about 250 points is not adding values

          On an A4 chart in landscape lets say the chart area is 6" long and at 300dpi that is 2000 pixels
          Once again using more than 800-1000 points will not add any value

          I have seen charts with 30,000+ points and when this is explained and a work around shown people appreciate the speed up

          For a work around try setting up an area where you select say every x'th point using an Offset or Index Function
          Then plot that data

          I'd suggest asking the question in the Chandoo.org Forums http://forum.chandoo.org/
          Attach a sample file with an example of what you are after

Leave a Reply