Flu Trends Chart in Excel [Yes, we can edition]

Share

Facebook
Twitter
LinkedIn

Last week I have reviewed Google’s flu trends chart and told you why it is an awesome chart. This week, I am going to show you how such a chart can be constructed in Excel.

First let me show you what I am able to do in Excel:

Flu Trends Chart in Excel

(compare this with actual chart on Google)

How I made the flu-trends chart in excel?

  1. Data, Data, Data: Data plays an important role in complex charts like these. The source data is thankfully available for download from Google.  Flu incidence data is available by week (Sunday to Saturday) for every week since 28th Sep, 2003. For each week the data if given for all regions in various columns. But I was not able to use the data “as-is” to construct this chart. I had to massage and rearrange it a bit.
  2. The main issues is how flu season is classified (it starts on July and ends in June) and how the data is (we got weekly flu incident data, starting from Sunday to Saturday). The main issue here is each year, the weeks start on different dates. For eg. first Sunday in 2010 was on 3rd Jan where as in 2009 it was on 4th Jan. I tried using WEEKNUM() formula (examples), but it didn’t work well with the flu season (Jul to Jun). So I did some basic date math and ended up mapping weeks uniformly across years.
  3. The next issue is taking one big table of data with dates in rows and regions in columns and transform it to weeks in rows, years and columns and actual flu data for the selected region in the cells.
  4. Then I set up 2 cells, one where user would specify “region” and other where a comparison “year” can be selected. I have used data validation to control the valid inputs.
  5. I used the MATCH, INDEX formulas to fetch corresponding weekly values for all years for selected region. Thanks to MATCH, INDEX and HLOOKUP formulas, this is not such a big task either. And if the optional comparison year is specified, we repeat that years values in another column. Otherwise that column is NA().
  6. Using these columns, I made a line chart. Then I cleaned up the chart and formatted the 2009-2010 series in thick blue and rest all in thin light blues. The optional comparison series was colored in red (for contrast). [related: line chart examples]
  7. The only remaining piece is to show the heat map of flu intensities below the chart. For this I have used the very useful 3 color scale conditional formatting setting in Excel 2007. (of course, I had to setup some extra calculations so that the intensities are normalized across the region / years and change when user selects a new region, but you already guessed it.)
    Excel Conditional Formatting - Heatmaps
  8. I choose to drop the colorful legend as it adds little value.
  9. The rest is some formatting and presentation.

What I learned from this experience?

  • When I looked at Google’s chart, I doubted if it can be created in Excel. But I was wrong. It can be done in excel, and it takes no more than 2 hours.
  • Data and structure of it play extremely important role in any visualization.We should understand the data and know how to arrange / transform / massage it, to make better charts.
  • Date formulas are a flu in the nose.
  • Excel 2007 conditional formatting is just awesome. [more examples]
  • INDEX, MATCH, LOOKUP formulas are very powerful. I *respect* them. [here is a tutorial]

Download flu trends chart and play with it

Download the file (Excel 2007 only). The file is locked, but there is no password. Play with it and tell me if you like it.

Do you like this chart?

Have you done something similar in Excel? What was your experience like? Do you like this chart? How would you improve / change it?

More visualizations using Excel:

Olympic Medals by Country | Survey Results Dashboard | Test Cricket Statistics | Dynamic Charts

PS: After a looong time this post had many “I”s

PPS: Have a good weekend.

Facebook
Twitter
LinkedIn

Share this tip with your colleagues

Excel and Power BI tips - Chandoo.org Newsletter

Get FREE Excel + Power BI Tips

Simple, fun and useful emails, once per week.

Learn & be awesome.

Welcome to Chandoo.org

Thank you so much for visiting. My aim is to make you awesome in Excel & Power BI. I do this by sharing videos, tips, examples and downloads on this website. There are more than 1,000 pages with all things Excel, Power BI, Dashboards & VBA here. Go ahead and spend few minutes to be AWESOME.

Read my storyFREE Excel tips book

Overall I learned a lot and I thought you did a great job of explaining how to do things. This will definitely elevate my reporting in the future.
Rebekah S
Reporting Analyst
Excel formula list - 100+ examples and howto guide for you

From simple to complex, there is a formula for every occasion. Check out the list now.

Calendars, invoices, trackers and much more. All free, fun and fantastic.

Advanced Pivot Table tricks

Power Query, Data model, DAX, Filters, Slicers, Conditional formats and beautiful charts. It's all here.

Still on fence about Power BI? In this getting started guide, learn what is Power BI, how to get it and how to create your first report from scratch.

40 Responses to “Looking up when the data won’t co-operate (case study)”

  1. Sumit Bansal says:

    Nice Trick.. Clever use of cell references

    Here is a formula I tried to create:
    =SUMPRODUCT(((NOT(ISERROR(SEARCH(L5,B4:H14))))*1),(B5:H15))

    It takes care of Caveat #1 (can handle text), but Caveat #2 remains.

  2. PeterB says:

    In situations like this, I will often use VBA to restructure the data (2 columns: dates and values) on to a new worksheet. I can then use this 'clean' source for data analysis (formula or pivot table).

    =SUMPRODUCT(((NOT(ISERROR(SEARCH(L5,B4:H14))))*1),(B5:H15)) and complex formulae in general are all very well but when you come back to them in a few weeks / months time, it is not at all easy to see what they do and what the limitations are.

  3. Somendra Misra says:

    Hi Chandoo,

    I had used this type of cell ref. various times while calculating average.
    But for the situation here try below formula . Note this is an array formula and must be confirmed with Ctrl+Shift+Enter.

    =SMALL(IF(MMULT((L5=B4:H14)*IF(ISNUMBER(B5:H15),B5:H15),{1;1;1;1;1;1;1}),MMULT((L5=B4:H14)*IF(ISNUMBER(B5:H15),B5:H15),{1;1;1;1;1;1;1})),1)

    Regards,

  4. Somendra Misra says:

    Hi, I think Using SEARCH in here will create a problem say there is a text like SUN and another text SUNLIGHT both result will be added by SUMPRODUCT.

    Regards,

  5. Elias says:

    Array option.

    =SUM(IF(MOD(ROW(B4:B14),2)=MOD(ROW(B4),2),IF(B4:H14=L5,B5:H15)))

    Regards

    • Michael (Micky) Avidan says:

      @Elias,
      Nice approach.
      Although not requested - the formula I suggested closes all possibilities.
      Criteria: _____ Value to retrieve:
      Date__________ Numeric
      Date__________Textual
      Textual________Numeric
      Textual________Textual
      While your formula copes with only the 3 first combinations.
      Michael (Micky) Avidan
      “Microsoft® Answer” – Wiki author & Forums Moderator
      “Microsoft®” MVP – Excel (2009-2015)
      ISRAEL

      • Elias says:

        @Michael,

        Sorry but I don’t understand your point. I believe the challenge was to return the summary of a given date. What is your really volatile formula doing that mine is not?

        Regards

        • Michael (Micky) Avidan says:

          @Elias,
          I didn't say that the challenge differs from what you just mentioned/aimed to nor that your formula doesn't provide the requested result.
          Please read my previous comment again and focus on the last combination (TEXT / TEXT).
          I, myself, always try to provide a global Formula that is capable to handle all sorts of data.
          Michael (Micky) Avidan
          “Microsoft® Answer” – Wiki author & Forums Moderator
          “Microsoft®” MVP – Excel (2009-2015)
          ISRAEL

          • Elias says:

            @Michael,
            I see your point, but you are missing the below points if you are trying to cover all sorts of data.

            What happened if the lookup value does not exist?

            Do you want the first, second, summary, concatenation of the values if the look value is repeated?

            See they are too many possibilities to be cover with just one formula.

            Regards

  6. Michael (Micky) Avidan says:

    1) The range: B4:H15 was named: RNG.
    2) The following Array Formula was "retrieved from my sleeve" and I hope it can be shorten.
    3) The formula seems to take care of BOTH(!) caveats.
    -------------------------------------------------------------------------------
    =OFFSET(INDIRECT(ADDRESS(SMALL(IF(RNG=L5,ROW(RNG),""),1),MOD(SMALL(IF(RNG=L5,(ROW(RNG))+COLUMN(RNG)/10),1),1)*10)),1,)
    -------------------------------------------------------------------------------
    Michael (Micky) Avidan
    “Microsoft® Answer” – Wiki author & Forums Moderator
    “Microsoft®” MVP – Excel (2009-2015)
    ISRAEL

    • Elias says:

      @Michael,

      Check what happened with your result if you type 41927 in D5.

      Regards

      • Michael (Micky) Avidan says:

        Correct. Didn't predict that.
        Will find time to work something out.
        Michael (Micky) Avidan
        “Microsoft® Answer” – Wiki author & Forums Moderator
        “Microsoft®” MVP – Excel (2009-2015)
        ISRAEL

      • Michael (Micky) Avidan says:

        @Elias,
        Let's hope the following Array Formula "closes all open doors".
        Again - it has nothing to do with your formula which works fine as long as the 3 first mentioned combinations are concerned.
        -------------------------------------------------------------------------------
        =INDEX(RNG,LARGE(IF(RNG=L5,MOD(ROW(RNG)-1,2)*(ROW(RNG)),""),1)-2,(MOD(SMALL(IF(RNG=L5,(ROW(RNG))+COLUMN(RNG)/10),1),1)*10)-1)
        -------------------------------------------------------------------------------
        Michael (Micky) Avidan
        “Microsoft® Answer” – Wiki author & Forums Moderator
        “Microsoft®” MVP – Excel (2009-2015)
        ISRAEL

        • Elias says:

          Ok, if you insist. The following will cover all the scenarios you listed. However, I’ll never recommend/use such of formula.

          Defined names:
          rDat = $B$4:$H$15
          rRow =ROW(rDat)-MIN(ROW(rDat))+1
          rCol =COLUMN(rDat)-MIN(COLUMN(rDat))+1
          rInc =MOD(rRow,2)=MOD(MIN(rRow),2)

          L6=INDEX(rDat,MAX(IF(rInc,IF(rDat=L5,rRow)))+1,MAX(IF(rInc,IF(rDat=L5,rCol))))
          Array Enter

          Regards

        • Frank says:

          @Michael,

          unfortunately, your array formula still seems to return wrong results (eg 3-Nov).

  7. Magda says:

    If data are organized like in the example, ie. looks like a calendar, the INDEX formula seems quite simple:
    =INDEX($B$4:$H$15,ROUNDDOWN((L5-B4)/7,0)*2+2,MOD((L5-B4),7)+1)

    • Marc says:

      Yours is effectively the same as what I just came up with, and I believe this is the optimal answer to this particular problem.

      My solution, before I saw yours:
      =OFFSET(B5,QUOTIENT(L5-B4,7)*2,MOD(L5-B4,7))

      OFFSET will work for an arbitrary list size, but INDEX might be easier to read.
      QUOTIENT does the round and division in a single step.

  8. XOR LX says:

    If there's an improvement over Elias's solution then I for one can't see it.

    Perhaps a non-CSE version which would also mean that only two references (B4:H14 and B5:H15), as opposed to three (B4, B4:B14 and and B5:H15), would require manually amending should the data range change, i.e.:

    =SUMPRODUCT((ISEVEN(ROW(B4:H14)-MIN(ROW(B4:H14)))*(B4:H14=L5)*B5:H15))

    I suppose we could make it a single, uniform range reference:

    =SUMPRODUCT((ISEVEN(ROW(B4:H14)-MIN(ROW(B4:H14)))*(B4:H14=L5)*OFFSET(B4:H14,1,,,)))

    which might be more appropriate should we e.g. wish to use a Defined Name for our range, i.e.:

    =SUMPRODUCT((ISEVEN(ROW(Rng)-MIN(ROW(Rng)))*(Rng=L5)*OFFSET(Rng,1,,,)))

    though whether that compensates for the extra, volatile function call is something to be debated.

    Regards

  9. Mahir says:

    I have tried something and then my Excel workbooks got shut down. Maybe that was too much?

    Anyway here is what I've tried:

    =SUMPRODUCT(INDEX(B5:H15;IF(ISEVEN(ROW(B5:H15));ROW(B5:B15)-ROW(B5)+1);{1\2\3\4\5\6\7}))

    Guess that was wrong? Would this approach work anyway?

    Looking forward to learn something from you Excel Experts.

  10. XOR LX says:

    @Michael Avidan

    As it stands that is not a very rigorous construction.

    You say "I, myself, always try to provide a global Formula that is capable to handle all sorts of data", which is a wonderful philosophy, but isn't it at least as important that we ensure that our formulas are independent of the row and column references of the data range in question, so that, should that range change, we do not have to re-work our solution?

    What happens with your formula, for example, if RNG is instead re-located one row down, from B4:H15 to B5:H16?

    When a formula is reliant upon the addition/subtraction of certain constants within the formula, which themselves are necessarily dependent upon the specific rows/columns in which the data lies at any given time (e.g. the -1 in MOD(ROW(RNG)-1,2)), then that formula is not a very flexible one.

    Hence the reason for my choice of a slightly longer construction:

    ROW(B4:H14)-MIN(ROW(B4:H14))

    which ensures that this part of the calculation is not dependent upon the precise location of the data range within the worksheet, and so will give correct answers even if that range is re-located.

    Regards

  11. Haz says:

    {=OFFSET(B4,MAX((B4:H15=L5)*ISODD(ROW(1:12))*ROW(1:12)),MAX((B4:H15=L5)*ISODD(ROW(1:12))*COLUMN(A:G))-1)}

    • Haz says:

      Non-array formula:
      =INDEX(B4:H15,SUMPRODUCT((B4:H15=L5)*(ROW(B4:H15)-ROW(B4)+1)*ISODD(ROW(B4:H15)-ROW(B4)+1))+1,SUMPRODUCT((B4:H15=L5)*(COLUMN(B4:H15)-COLUMN(B4)+1)*ISODD(ROW(B4:H15)-ROW(B4)+1)))

      Using one range (B4:H15), one reference (B4), one lookup value (L5) and no INDIRECT or OFFSET.

  12. MF says:

    My trial with defined names:

    DateRange
    =$B$4:$H$4,$B$6:$H$6,$B$8:$H$8,$B$10:$H$10,$B$12:$H$12,$B$14:$H$14

    Position
    =RANK('lookup problem'!$L$5,DateRange,1)

    L6
    =OFFSET(B4,ROUNDUP(Position/7,0)*2- 1,IF(MOD(Position,7)=0,6,MOD(Position,7)-1))

  13. Jeff Weir says:

    I'd probably just run with something like:
    =SUMPRODUCT((B4:H14=L5)*(MOD(ROW(B4:H14),2)=MOD(ROW(B4),2))*B5:H15)
    ...which is basically the same as Elias' but without the IFs

  14. Bermir says:

    The opposite of elegant but it works...

    =INDEX(B4:H15,IFERROR(MATCH(L5,B4:B14,0),0)+IFERROR(MATCH(L5,C4:C14,0),0)+IFERROR(MATCH(L5,D4:D14,0),0)+IFERROR(MATCH(L5,E4:E14,0),0)+IFERROR(MATCH(L5,F4:F14,0),0)+IFERROR(MATCH(L5,G4:G14,0),0)+IFERROR(MATCH(L5,H4:H14,0),0)+1,IFERROR(MATCH(L5,B4:H4,0),0)+IFERROR(MATCH(L5,B6:H6,0),0)+IFERROR(MATCH(L5,B8:H8,0),0)+IFERROR(MATCH(L5,B10:H10,0),0)+IFERROR(MATCH(L5,B12:H12,0),0)+IFERROR(MATCH(L5,B14:H14,0),0))

    • Bermir says:

      =INDEX(B4:H15,
      IFERROR(MATCH(L5,B4:B14,0),0)+
      IFERROR(MATCH(L5,C4:C14,0),0)+
      IFERROR(MATCH(L5,D4:D14,0),0)+
      IFERROR(MATCH(L5,E4:E14,0),0)+
      IFERROR(MATCH(L5,F4:F14,0),0)+
      IFERROR(MATCH(L5,G4:G14,0),0)+
      IFERROR(MATCH(L5,H4:H14,0),0)+1,
      IFERROR(MATCH(L5,B4:H4,0),0)+
      IFERROR(MATCH(L5,B6:H6,0),0)+
      IFERROR(MATCH(L5,B8:H8,0),0)+
      IFERROR(MATCH(L5,B10:H10,0),0)+
      IFERROR(MATCH(L5,B12:H12,0),0)+
      IFERROR(MATCH(L5,B14:H14,0),0))

  15. HF says:

    Named Range
    rownum = SUMPRODUCT(('lookup problem'!$B$4:$H$14='lookup problem'!$L$5)*ROW('lookup problem'!$B$4:$H$14)*ISEVEN(ROW('lookup problem'!$B$4:$H$14)))

    Formula
    =OFFSET($A$1,rownum,MATCH(L5,INDIRECT("$B"&rownum&":$H"&rownum),0))

  16. Venky says:

    How about SUM(IF(B4:H14=L5,B5:H15)) with array..it should work

  17. xen says:

    Sorry, Chandoo, you can't find stuff this way in every possible scenario.
    What if 2014-10-01 sales would equal 41.927 ? Which is serial number for 2014-10-15 ? SUMIF would fail to retrive correct answer. And your example data suggest that such number is possible in your table.

    It's better not to search through dates and numbers at the same time.

    If I'd solve a problem like this, it'd reformat table first so I get one column with dates and the other with numbers.

    In this case, formula to form date column would be:
    =INDIRECT(ADDRESS((INT((ROW()-4)/COUNT($B$4:$H$4))+1)*2+2;MOD(ROW()-4;COUNT($B$4:$H$4))+2;4;1))
    and numbers would be the same formula with sight adjustment (+3 instead of +2 at the end of first argument):
    =INDIRECT(ADDRESS((INT((ROW()-4)/COUNT($B$4:$H$4))+1)*2+3;MOD(ROW()-4;COUNT($B$4:$H$4))+2;4;1))

    And now you got two columns that you can safely use for searching!

  18. xen says:

    Oops, sorry, you actually mentioned that it doesn't work if number=date! I missed that part 🙁

  19. Vad says:

    ={OFFSET(A1,SUM((B4:H14=L5)*ROW((B4:H14))),SUM((B4:H14=L5)*COLUMN((B4:H14)))-1)}

    Works for all data... the solution I got for indirect looks little lengthy

  20. Rajesh says:

    I want to count last 20 records of a person, whose marks is greater than 2 and grade "manager". ....

    Assume A1 has got names (James, John...etc...)
    A2 "Manager"

    A3 "2"

    Someone please reply

  21. Rajesh says:

    I want to count last 20 records of a person, whose marks is greater than "2" and grade "Manager"

    Assume A1 "geroge" A2 "Michael" A3 "George" etc...name can found anywhere in the rows

    B1 "Manager" B2" clerk"

    C1 "2" C2, "4"

    please reply

  22. Marc says:

    Simplest I can come up with. No limitations for either 1 or 2. This does assume dates are an ordered list with 7 per row, and 2 rows per set. Assuming this is always true this will work for an arbitrary long list of dates.

    =OFFSET(B5,QUOTIENT($L$5-$B$4,7)*2,MOD($L$5-$B$4,7))

    • Michael (Micky) Avidan says:

      @Marc,
      Nice approach - however, as there are no "Negative Dates" - try:
      =OFFSET(B5,INT(L5-B4)/7)*2,MOD(L5-B4,7))
      ——————————————————————————-
      Michael (Micky) Avidan
      “Microsoft® Answer” – Wiki author & Forums Moderator
      “Microsoft®” MVP – Excel (2009-2015)
      ISRAEL

  23. Vijaykumar Shetye says:

    =OFFSET(B4,ROUNDUP((L5-41911+1)/7,0)*2-1,MOD(L5-41911,7))

    B4 has been used as reference cell for OFFSET().

    FOR ROWS:
    ROUNDUP(....,0) gives the integer value of a division. In case of presence of a remainder, ROUNDUP will add 1 to the Quotient.
    As opposed to ROUNDUP(), the INT() or QUOTIENT() functions eliminate the remainder.

    41911 = 01-Sept-2014, the first date in the data.

    *2 has been used because there are 2 columns per set of data.

    /7 has been used because there are 7 columns per set of data.

    For columns
    MOD(L5-41911,7))

    Vijaykumar Shetye,
    Panaji, Goa, India

  24. Awais says:

    This is how i did it

    {=INDEX(B4:H15, MAX((L5=B4:H15)*ROW(B4:H15))-2, MAX((L5=B4:H15)*COLUMN(B4:H15))-1 )}

  25. Paul says:

    Here's my solution:

    =INDEX(B4:H15,MATCH(1,MMULT(--(B4:H15=L5),TRANSPOSE(COLUMN(B4:H15)^0)),0)+1,MATCH(1,MMULT(TRANSPOSE(--(B4:H15=L5)),ROW(B4:H15)^0),0))

Leave a Reply