Get Stock Quotes using Excel Macros [and a Crash Course in VBA]

Share

Facebook
Twitter
LinkedIn

This is a guest post by Daniel Ferry of Excelhero.com.

Excel Stock Quotes - using VBA Macors to fetch live stock quotes from Yahoo Finance to ExcelHave you ever wanted to fetch live stock quotes from excel? In this post we will learn about how to get stock quotes for specified symbols using macros.

One method that has worked well for my clients can be implemented with just a few lines of VBA code. I call it the ActiveRange.

An ActiveRange is an area on a worksheet that you define by simply entering the range address in a configuration sheet. Once enabled, that range becomes live in the sense that if you add or change a stock symbol in the first column of the range, the range will automatically (and almost instantly) update. You can specify any of 84 information attributes to include as columns in the ActiveRange. This includes things such as Last Trade Price, EBITDA, Ask, Bid, P/E Ratio, etc. Whenever you add or change one of these attributes in the first row of the ActiveRange, the range will automatically update as well.

Sound interesting, useful?

In this post, you can learn how to use excel macros to fetch live stock quotes from Yahoo! Finance website. It is also going to be a crash course in VBA for the express purpose of learning how the ActiveRange method works so that you can use it yourself.

Download Excel Stock Quotes Macro:

Click here to download the excel stock quotes macro workbook. It will be much easier to follow this tutorial if you refer to the workbook.

Background – Understanding The Stock Quotes Problem:

The stock information for the ActiveRange will come from Yahoo Finance. A number of years ago, Yahoo created a useful interface to their stock data that allows anyone at anytime to enter a URL into a web browser and receive a CSV file containing current data on the stocks specified in the URL. That’s neat and simple.

But it gets a little more complicated when you get down to specifying which attributes you want to retrieve [information here]. Remember there are 84 discreet attributes available. Under the Yahoo system, each attribute has a short string Tag Code. All we need to do is to concatenate the string codes for each attribute we want and add the resulting string to the URL. We then need to figure out what to do with the CSV file that comes back.

Our VBA will take care of that and manage the ActiveRange. Excel includes the QueryTable as one of its core objects, and it is fully addressable from VBA. We will utilize it to retrieve the data we want and to write those data to the ActiveRange.

Before we start the coding we need to include two support sheets for the ActiveRange. The first is called “YF_Attribs”, and as the name implies is a list of the 84 attributes available on Yahoo Finance along with their Yahoo Finance Tag Codes. The second sheet is called, “arConfig_xxxx” where xxxx is the name of our sheet where the ActiveRange will reside. It contains some configurable information about the ActiveRange which our VBA will use.

All of the VBA code for this project will reside inside of the worksheet module for the sheet where we want our ActiveRange to be. For this tutorial, I called the sheet, “DEMO”.

Writing the Macros to Fetch Stock Quotes:

Adding VBA Code to Worksheets - Excel Stock Quotes

Press ALT-F11 on your keyboard, which will open the VBE. Double click on the DEMO sheet in the left pane. We will enter out code on the right. To begin with, enter these lines:

Option Explicit
Private rnAR_Dest As Range
Private rnAR_Table As Range
Private stAR_ConfigSheetName As String

Always start a module with Option Explicit. It forces you to define your variable types, and will save you untold grief at debugging time. In VBA each variable can be one of a number of variable types, such as a Long or a String or a Double or a Range, etc. For right now, don’t worry too much about this – just follow along.

Sidebar on Variable Naming Conventions

Variable names must begin with a letter. Everyone and their brother seems to have a different method for naming variables. I like to prefix mine with context. The first couple of letters are in lower case and represent the type of the variable. This allows me to look at the variable anywhere it’s used and immediately know its type. In this project I’ve also prefaced the variables with “AR_” so that I know the variable is related to the ActiveRange implementation. In larger projects this would be useful. After the underscore, I include a description of what the variable is used for. That’s my method.

In the above code we have defined three variables and their types. Since these are defined at the top of a worksheet module, they will be available to each procedure that we define in this module. This is known as scope. In VBA, variables can have scope restricted to a procedure, to a module (as we have done above), or they can be global in scope and hence available to the entire program, regardless of module. Again we are putting all of the code for this project in the code module of the DEMO worksheet. Every worksheet has a code module. Code modules can also be added to a workbook that are not associated with any worksheet. UserForms can be added and they have code modules as well. Finally, a special type of code module, called a class module, can also be added. Any global variables would be available to procedures in all of these. However, it is good practice to always limit the scope of your variables to the level where you need them.

In that vein, notice that the three variables above are defined with the word Private. This specifically restricts their scope to this module.

Every worksheet module has the built-in capability of firing off a bit of code in response to a change in any of the sheet’s cell values. This is called the Worksheet_Change event. If we select Worksheet from the combo box at the top and Change in the other combo box, the VBE will kindly define for us a new procedure in this module. It will look like this:

Adding Worksheet_Change Event

Private Sub Worksheet_Change(ByVal Target As Range)
End Sub

Notice that by default this procedure is defined as Private. This is good and as a result the procedure will not show up as a macro. Notice the word Target near the end of the first line. This represents the range that has been changed. Place code between these two lines so that the entire procedure now looks like this:

The Heart of our Excel Stock Quotes Code – Worksheet_Change()

Private Sub Worksheet_Change(ByVal Target As Range)

ActivateRange

If Worksheets(stAR_ConfigSheetName).[ar_enabled] Then

If Intersect(Target, rnAR_Dest) Is Nothing Then Exit Sub

If Target.Column <> rnAR_Dest.Column And Target.Row <> rnAR_Dest.Row Then

PostProcessActiveRange

Exit Sub

End If

ActiveRangeResponse

End If

End Sub

That may look like a handful but it’s really rather simple. Let’s step through it. The first line is ActivateRange. This is the name of another sub-procedure that will be defined in a moment. This line just directs the program to run that sub, which provides values to the three variables we defined at the top. Again, since those variables were defined at the top of the module, their values will be available to all procedures in the module. The ActivateRange procedure gives them values.

Next we see this odd looking fellow:

If Intersect(Target, rnAR_Dest) Is Nothing Then Exit Sub

All this does is check to see if the Target (the cell that was changed on the worksheet) is part of our ActiveRange. If it is the procedure continues. If it’s not, the procedure is exited.

The next line checks to see if the cell that was changed is in the first column or first row of the ActiveRange. If it is, the post processing is skipped. If the change is any other part of the ActiveRange, another sub-procedure (defined below) is run to do some post processing of the retrieved data, and then exits this procedure.

If the cell that changed was in the first column or the first row, the program runs another sub-procedure, called ActiveRangeResponse, which is also defined below. ActiveRangeResponse builds the URL for YF, deletes any previous QueryTables related to the ActiveRange, and creates a new QueryTable as specified in our configuration sheet.

That’s it. The heart of the whole program resides here in the Worksheet_Change event procedure. It relies on a number of other subprocedures, but this is the whole program. When a change is made in the ActiveRange’s first column (stock symbols) or its first row (stock attributes), ActiveRangeResponse runs and our ActiveRange is updated.

Understanding other sub-procedures that help us get the stock quotes:

So let’s look at those supporting subprocedures. The first is ActivateRange:

Private Sub ActivateRange()

stAR_ConfigSheetName = “arConfig_” & Me.Name

Set rnAR_Dest = Me.Range(Worksheets(stAR_ConfigSheetName).[ar_range].Value)

Set rnAR_Table = rnAR_Dest.Resize(1, 1).Offset(1, 1)

Worksheets(stAR_ConfigSheetName).[ar_YFAttributes] = GetCurrentYahooFinancialAttributeTags

End Sub

Again, all this does is give values to our three module level variables. In addition it builds the concatenated string of YF Tag Codes required for the URL. It does this by calling a function that I’ve defined at the very bottom of the module, called GetCurrentYahooFinancialAttributeTags.

The next subprocedure is ActiveRangeResponse:

Private Sub ActiveRangeResponse()

Dim vArr As Variant

Dim stCnx As String

Const YAHOO_FINANCE_URL = “http://finance.yahoo.com/d/quotes.csv?s=[SYMBOLS]&f=[ATTRIBUTES]”

vArr = Application.Transpose(rnAR_Dest.Resize(rnAR_Dest.Rows.Count – 1, 1).Offset(1))

stCnx = Replace(YAHOO_FINANCE_URL, “[SYMBOLS]”, Replace(WorksheetFunction.Trim(Join(vArr)), ” “, “+”))

stCnx = Replace(stCnx, “[ATTRIBUTES]”, Worksheets(stAR_ConfigSheetName).[ar_YFAttributes])

AddQueryTable rnAR_Table.Resize(UBound(vArr)), “URL;” & stCnx

End Sub

Notice that here we have variables defined at the top of this procedure and consequently their scope is limited to this procedure only. This means that we could have the same variable names defined in other procedures but those variables would not be related to these and would have completely different values.

Next notice that we have defined a constant. This is good practice, as it forces us to specify what the constant value is by naming the constant. I could have just used the value where I later use the constant, but then the question arises as to what is this value and where did it come from. Here I have named the value, YAHOO_FINANCE_URL, removing all doubt as to its purpose.

The next line is this:

vArr = Application.Transpose(rnAR_Dest.Resize(rnAR_Dest.Rows.Count - 1, 1).Offset(1))

and it deserves some explanation. Let me back up by saying that whenever we write or read multiple cells from a worksheet we should always try to do it in one go, rather than one cell at a time. The more cells involved the more important this is. Otherwise we pay a massive penalty in processing time. One of the best optimization techniques available is to replace code that loops through cell reads/writes and replace it with code that reads/writes all the cells at once. It can literally be hundreds to thousands of times faster.

Here we are interested in getting the list of all of the stock symbols in the first column of the ActiveRange. So how do we get them in one shot? We use something called a variant array. Notice that we defined vArr at the top of this procedure. A variant array is a special kind of variable that holds a list of values and it DOES NOT CARE what variable types those values are. This is important when retrieving data from a sheet because the data could be numbers, text, Boolean (True or False), etc. Variants are powerful, but they are much slower than other variable types, such as a Long for numeric data for example. However, in the case of retrieving or writing large chunks of data from/to a sheet the slight penalty of the variant is dwarfed by the massive increase in the speed of data transfer.

It’s very simple to retrieve range data (regardless of the size) into a variant array. All you do is:

v = range

where v is defined as a variant and range is any VBA reference to a worksheet range. And magically all of the values in that range are now in v. Note that v is not connected to the range. A change in any of v’s values does not propogate back to the range, and likewise a change to the range does not make it’s way to v all by itself. v will ALWAYS be a two-demensional array. The first dimension is the index of the rows, the second dimension is the index of the columns. So v(1,1) will refer to the value that came from the top left cell in the range. v(6,9) will hold the value that came from the cell in the range at row 6 and column 9.

For most circumstances this two-dimensional format is fine. But we are only retrieving one column of stock symbols. The procedure will still give us a two-dimensional array, with the column dimension being only 1 element wide. This is a shame because VBA has a wonderful function called Join that allows you in one step (no loop) to concatenate every element of an array into a string. You can even specify a custom string to delimit (go in-between) each element in the output string. The problem is that Join only works on single dimensioned arrays 🙁

But there’s always a way, right? We can use the Application.Transpose method on the 2-D array and presto we get a 1-D array. The rest of the line just specifies what range (the stock symbols) to grab.

The next two lines are:

stCnx = Replace(YAHOO_FINANCE_URL, "[SYMBOLS]", Replace(WorksheetFunction.Trim(Join(vArr)), " ", "+"))

stCnx = Replace(stCnx, "[ATTRIBUTES]", Worksheets(stAR_ConfigSheetName).[ar_YFAttributes])

Again a handful, but all we are doing here is replacing the monikers, [SYMBOLS] and [ATTRIBUTES] in the YAHOO_FINANCE_URL constant with the list of stock symbols (delimited by a plus sign) and the string of attributes.

In the final line of the procedure:

AddQueryTable rnAR_Table.Resize(UBound(vArr)), "URL;" & stCnx

we are running another subprocedure called, AddQueryTable and we are telling it where to place the new QueryTable and providing the connection string for the QueryTable, which in this case is the YF URL that we just built.

Nothing unusual happens in the AddQueryTable sub. It just deletes any existing AR related QueryTables and adds the new one according to the options in the configuration sheet.

The PostProcessActiveRange sub is interesting:

Private Sub PostProcessActiveRange()

If rnAR_Dest.Columns.Count > 2 Then

Application.DisplayAlerts = False

rnAR_Table.Resize(rnAR_Dest.Rows.Count).TextToColumns Destination:=rnAR_Table, DataType:=xlDelimited, Comma:=True

Application.DisplayAlerts = True

Worksheets(stAR_ConfigSheetName).[ar_LocalTimeLastUpdate] = Now

End If

End Sub

Processing Yahoo Finance Output using Query Table & Text-Import Utility:

As mentioned before the data from YF comes back as a CSV file. The QueryTable dumps this into one column. If you were only retrieving one attribute for each stock this would be fine as is. However, two or more attributes is going to result in unwanted commas and multiple attribute values squished into the first column of the QueryTable output. Unfortunately this is poor design by Microsoft, especially when you consider that the QueryTable does not behave like this when it is retrieving SQL data or opening a Text file from disk. You can actually specify this operation to be a text file and it will properly spread the output over all of the columns. To do so, you specify the disk location as being the URL of the YF CSV file, but as Murphy would have it, it’s unbelievably slow and pops up a status dialog as it slowly retrieving the CSV. Using the URL instruction instead of the TEXT instruction at the beginning of the connection string is incredibly fast in comparison, but dumps all of the data into the first column.

So what to do? We’ll just employ Excel’s built-in TextToColumns capability and bam, our data is where we want it.

Our finalized stock quotes fetcher worksheet should look like this:

Excel Stock Quotes - Final workbook - Demo

Download Excel Stock Quotes Macro:

Click here to download the excel stock quotes macro workbook. It will be much easier to follow this tutorial if you refer to the workbook.

Final Thoughts on Excel Stock Quotes

The ActiveRange technique is quite versatile. It can be implemented with other data sources such as SQL, or even lookups to other Excel files, or websites.

In this example it provides a nice way to easily track whatever stocks you may have interest in and up to 84 different attributes of those stocks. You can enable and disable the activeness of the ActiveRange on the fly. You can set the AR to AutoRefresh the data at periods that you set or to not refresh at all.

This is a basic implementation. For example, changing the AutoRefresh setting will have no effect until a new QueryTable is built. That won’t happen until you also add or change a stock symbol or add or change an attribute. An easy enhancement would be to add a little code to the arConfig_DEMO code module to respond to changes to the ar_AutoRefresh named range cell.

Another enhancement would be to eliminate the slight flicker of the update by moving the QueryTable destination to the arConfig_DEMO and then doing the TextToColumns with the destination set to the DEMO sheet. In an effort to simplify this tutorial I have left these easy enhancements as an exercise for you to implement.

Have a question or doubt? Please Ask

Do you have any questions or doubts on the above technique? Have you used ActiveRange or similar implementations earlier? What is your experience? Please share your thoughts / questions using comments.

I read Chandoo.org regularly and will be monitoring the post for questions. But you can also reach me at my blog:

Further References & Help on Excel Stock Quotes [Added by Chandoo]

This is a guest post by Daniel Ferry of Excel Hero.

Excel Hero is dedicated to expanding your notion of what is possible in MS Excel and to inspiring you to become an Excel Hero at your workplace. It has many articles and sample workbooks on advanced Excel development and advanced Excel charting.

Facebook
Twitter
LinkedIn

Share this tip with your colleagues

Excel and Power BI tips - Chandoo.org Newsletter

Get FREE Excel + Power BI Tips

Simple, fun and useful emails, once per week.

Learn & be awesome.

Welcome to Chandoo.org

Thank you so much for visiting. My aim is to make you awesome in Excel & Power BI. I do this by sharing videos, tips, examples and downloads on this website. There are more than 1,000 pages with all things Excel, Power BI, Dashboards & VBA here. Go ahead and spend few minutes to be AWESOME.

Read my storyFREE Excel tips book

Overall I learned a lot and I thought you did a great job of explaining how to do things. This will definitely elevate my reporting in the future.
Rebekah S
Reporting Analyst
Excel formula list - 100+ examples and howto guide for you

From simple to complex, there is a formula for every occasion. Check out the list now.

Calendars, invoices, trackers and much more. All free, fun and fantastic.

Advanced Pivot Table tricks

Power Query, Data model, DAX, Filters, Slicers, Conditional formats and beautiful charts. It's all here.

Still on fence about Power BI? In this getting started guide, learn what is Power BI, how to get it and how to create your first report from scratch.

55 Responses to “Did Jeff just chart?”

  1. Jon Peltier says:

    1. You screwed up the link to Mike's post. Try this:
    Highlighting Outliers in your Data with the Tukey Method

    2. Your initial line chart would be easier to read if you'd used markers. I use markers to indicate where the data actually IS, and help show that the line only ties the data together and doesn't indicate more data, until the points are nearly touching.

    3. Take the chart with lots of data (the one you delete the horizontal axis from), plot in descending order of value (revenue), and plot it on a log-log scale. Many phenomena, including the one you're describing, show a power-law type behavior, that is, a straight line on the log-log plot. This relationship is known as Zipf's Law. It basically means very few items have large values and very many items have small values. The decreasing returns for the many small values has become famous in Internet marketing as the "long tail".

    Your data doesn't show classic Zipf behavior, but in Looking Back at Peltier Tech in 2009 (wow, was that really four years ago?) I show how the distribution of traffic from individual web pages follows this law nicely.

    Like Benford's Law (look it up), Zipf's law could probably be used to audit financial data to make sure the stated distributions are realistic.

    • jason says:

      Holy great chart wizards beard!!!! its THE John Peltier!!!!

      ................My name .....is..........john, i mean Jason!.... I love you!!... i mean your site!!!

      ahaha

  2. Stiino0 says:

    OMG I'm cracking up on the pun in the title hahaha I totally misread that. Great work, learned alot. Chandoo 4 life!

    • jason says:

      i will admit, it took me a bit to 'get it'.... i kept reading the title and was just like....,"wut? .......that doesnt make sen....oooooooooohhh!!" hahahahhah

  3. David Onder says:

    You are right to have issues with Tukey's method with the data you are using. Tukey's method is best for fairly normal distributions. Your distribution is NOT normal but highly skewed. There are other methods that could be used to mathematically determine the outliers. But, as you observed, the mathematical identification is not always necessary. Sometimes, just looking at the graph is all we need to do.

  4. Doosha says:

    While I agree with your statement regarding the arbitrary nature of the parameter decision in Tukey's method, I disagree with saying the visual alternative is the best way to go. I'll leave the parametric vs non-parametric test discussion for true academics and say there are many reasons why having a analytical/programmatic approach is preferred despite subjectivity concerns. This can be processed quickly on many different features and draw many insights that require your method to be repeated. I find a lot of value in both approaches and suggest that a good data geek (like us here @ chandoo.org) knows how to do both.

    Great post mate! Thanks for sharing.

    • Jeff Weir says:

      I disagree with saying that the visual alternative is the best way to go, too. Which is why I didn't say it. Rather I said "My preference..."

      But great point, Doosha.

      • Jon Peltier says:

        My preference is the visual approach, and very often it is the best approach.
         
        Let's take Mike's list of numbers as an example. Plotted on Jeff's line chart, I've indicated with orange circles the points that a blind mathematical approach calls outliers.
        Jon Peltier_Visual Outliers

         
        Yet with our eyes, it's easy to see that if the first three points are outliers, there is no reason to consider the fourth not to be one. A similar if not so strong statement can be said about the last two vs last four points. I've outlined the outliers by this visual approach.
         
        In any case, it's easy to see the points which are closely related, which are the ones I did not outline. If we blindly apply a mathematical approach, despite its ease of application to lots of features, we can easily assign points to one group when they fit best in another.

  5. Jeff Weir says:

    Thanks Jon.
    1). Fixed
    2). Fixed
    3). Stop it, you're giving me gas. 😉

    Question: While this data may follow Zipf's law, do we gain anything by confirming whether or not it does?

    • Jon Peltier says:

      I'm not sure in this case whether we benefit from knowing our data follows Zipf's law. But I suspect in addition to verifying there is no fraud in the numbers, it may help to target where we might focus efforts to improve the bottom line. Maybe we're tapped out in the middle range, but at the top end we could add a deluxe new product that has more features and a higher price. Or we could offer a stripped down product at the low end to capture people who would make a smaller purchase.

      • Jeff Weir says:

        I have a colleague who did some fraud stuff with Zipf's law. Or rather, identified some fraud stuff. I'll have to pick his brains and write it up. Thanks for reminding me.

        By the way, added a new section in the original, and have just added something else again. So check it out and give me your feedback.

        Nothing like writing a blog post by committee...especially if you're the chair. 🙂

  6. Hui... says:

    Elimination of outliers should only be done once you understand the historical or cause of variability within the data / system producing the data.

    To manually remove data is akin to taking specimens not samples of the data.

    As we are told nothing about source of the data and the intrinsic variability in the data to randomly remove 5 of the 20 samples (25%of the samples) appears, at a glance, an overkill

    Examining the data and some basic stats
    Measure Mean SD
    All data 57.45 33.52
    Exclude highlighted outliers 59.67 20.02
    Exclude choosen outliers 57.67 8.72

    Typically and if the data is normally distributed we would expect that most of the data would fall with +/- 3SD of the mean (well 1 in 370 should fall outside of this)

    Which in all cases the data fits nicely within this criteria except the 132 data point which falls outside the Highlighted criteria
    Measure Mean SD -3SD +3SD
    All data 57.45 33.52 -43.1 158.0
    Exclude highlighted outliers 59.67 20.02 -0.4 119.7
    Exclude choosen outliers 57.67 8.72 31.5 83.8

    Be very careful removing data, much better to simply analyze your model with both sets of data and understand the risks of using one set of data vs the other

    • Jeff Weir says:

      What? No mention of my "About as welcome as a chart in an elevator" crack? I thought that was a classic Aussie saying that would put wind in your sail, Hui 🙂

      Note that this post wasn't about removing outliers...just about identifying them. In fact, the first part of the post was about identifying outliers via plotting ranked data, and then the post segued via a 'while we're here' aside into how using the ranked data graphical approach can be quite handy in visually segment data, without making clear that I'd moved from looking at ways to identify outliers. Sloppy writing on my part. It won't happen again. At least, not within this post, anyway!

      As David points out, the subscription dataset doesn't really lend itself to outliers identification via Tukey's method anyway, because of the type of data involved. And as Jon points out, this is classic 'Zipf's law' stuff, where very few items have large values and very many items have small values, and those increasingly large values at the far end are to be expected. They're still outliers, but in this case they're outliers that we want.

      Zipf's law, long tail, power law...why the hell do we need so many names to describe the same damn thing is beyond me.

  7. Ian says:

    Jeff

    Regarding your 2nd chart with markers - whether a marker looks as if it sits on the line or off it depends on the size of the marker.
    Size 4, 6 and 7 markers look as if they are off centre whereas size 3, 5 and 8 are centred in my re-creation of the chart.
    I have found that, generally, odd size markers tend to be centred on the line with even size markers off centre.
    This is just one of a number of reasons why you shouldn't go with the Excel defaults when charting, even with the better defaults in 2013 over 2003.

    Thanks for the blog post.

    Ian

  8. I think the good point is the grouping into categories ... But overal I do not like very much. In the labels is written a lot of information ... too much ink. I used a type of bar chart not an area chart (even with less data does its job well).

    This approach is a little different
    https://sites.google.com/site/e90e50/scambio-file/bar_123.pngRobert's approach

    which avoids using all that text ... the average of the values, the number of people ... are more explicit without being boring.

    Here the excel file i used:

    https://sites.google.com/site/e90e50/scambio-file/Segmenting-customers-by-revenue-contribution_V1_r.xlsx

    • Jeff Weir says:

      Roberto: Thanks for the insightful comment. There's some things about your redesign that I like, and some things I don't.
      On the like side:
      * I think it's a great idea to put the numbers of customers across the bottom. I never thought of that.
      * I think your approach of showing the average within each segment (i.e. putting in the boxes within each series) is clever. That said, ultimately I think it's more distracting than just putting the average in the data label. But I certainly appreciate the technique, as well as the thought that went into it.

      On the 'dislike' side (and these are personal preferences):
      * I don't like having to look up move my eyes from the chart to the legend to decipher it. I think labeling each point directly makes it much more easy for the reader, and I use Jon Peltier's Label Last Point routine whenever I can for this reason. I seem to recall something in a Tufte or Few book that suggests this approach, and I'll try to dig it up and post back here. Point taken though that maybe I've got too much information in those data labels for your liking, and as per the above, at least one of those lines of info can be moved to the Horizontal axis.

      * I'm not a fan of the black background. I find it oppressive, compared to white.

      Thanks again for your insights.

      • Jaff said:
        [...] That said, ultimately I think it’s more distracting than just putting the average in the data label [...]

        I would like to know how many visitors have read what you have written in the labels?
        I looked at your chart at least 20 times and I've never read ... too much effort. But I'm very lazy, i'm sorry 🙂

        if you want the legend can be removed, you have a lot of space and options for the labels and you can use a series xy as I have done below for average value

        I do not like the black too ... But I had those lines that I liked white

        I tried to make some changes, I think it is better to sort in descending order, I have added the labels with the average value, so the y-axis can now be removed. I used the legend to show the total values ??(areas) this is a matter that needs to be shown, and that causes me a bit 'embarrassed ... I keep thinking above.
        http://goo.gl/EnYuR9Roberto_2

        • Jeff Weir says:

          Roberto: The problem with your chart is that it's no longer self-sufficient. How is a reader meant to know what those white boxes denote, and what the various numbers mean? You would have to explain that somewhere off the chart. Why not just explain it directly on the chart?

          Regarding your point I looked at your chart at least 20 times and I’ve never read … too much effort....this approach is drawn from one chart of many in a report I did for a management team some time back, to show them just how different their customers are. Previous to my report, they had tended to treat their subscription customers as a homogenous group.

          So far from being too lazy to read the info they were highly incentivised to read it, and this information in the labels was valuable insight to them. They commissioned me to provide insight into their customer base to a busy management team, and charts like this passed on the kind of information they wanted to know in a very concise manner.

          I could have put that extra information in a table below the chart. But putting in on the chart - in my opinion - was a much better design choice: they don't have to move their eyes around, and this approach clearly illustrates some very important commercial aspects of their business. Putting less information on the chart would have required putting more information in the text. And that in my opinion would have slowed down the time it took to absorb this stuff.

        • Jon Peltier says:

          Roberto:
          I like to see the data in descending order.
          I'm not wild about the black background, but it works.
          The labeling is a bit too weak. I know what the data is, so I can presume that each white rectangle shows a subtotal near 20% of the total, made up of so many customers paying an average of some dollar figure. But I have to work for it.
          But as Roberto points out, one also has to work to get the information out of Jeff's labels. I didn't completely ignore them, but in my first reading I read one label on the two charts.

          • Jeff said:
            Roberto: The problem with your chart is that it’s no longer self-sufficient. How is a reader meant to know what those white boxes denote, and what the various numbers mean?

            Jon said:
            I know what the data is, so I can presume that each white rectangle shows a subtotal near 20% of the total, made up of so many customers paying an average of some dollar figure. But I have to work for it.

            I think is very clear what the white boxes denote and catch my attention. Those are the containers for those colorful piles. It's like taking a pile of earth and put it in a bucket ... first it was just a bunch but after is a measured quantity. Our attention goes there!

            One big problem is (as Jon pointed out and I'm agree) ... the comparison between the different buckets / boxes is difficult ... ummm rather it is impossible. How can we solve? I think in two ways:
            1) we know that the groups are homogeneous, so use buckets / boxes that have the same volume (20%) ... in this case the chart can not explain it, but we need to know in advance. Labels can not help, are read after looking at the chart ... and we tried to understand ... Frustration!
            2) use how support one more graph (bar or pie if the groups are just 2-3)

            something that I think might help?
            decrease the number of groups, 2 or at most 3

          • Jon Peltier says:

            Roberto -

            "I think is very clear what the white boxes denote and catch my attention."

            But remember, you envisioned and implemented these boxes. It is impossible for you to forget what they are intended to show, at least not until you've put this chart away for a few months.

            Not having had the same inspiration as you, I have to scratch my head and try to figure out what you were thinking. I know how creative you are, so I know it could be nearly anything.

            That said, I don't think it needs very much additional labeling to clarify your chart. Something like this:
            http://peltiertech.com/images/2014-01/RobertoRedux.pngJon Peltier_Roberto Redux

          • Jeff Weir says:

            @Jon Peltier: At first I really liked your redesign. The grey background is easier on my eye than the jet black in Roberto's original. But then, I see there's no y axis. y not? Isn't that kinda mandatory? We've got no idea how large that largest sub is without it.

            And I miss the gridlines too.

            And then I thought, instead of showing the white boxes - which while a good concept, add quite a bit of clutter, why not just show the position of the average using one point.

            Check out my update in the original post to see what I've come up with.

            While I like the grey, I do think it's harder on the eyes than black text on white background. And I don't think a grey chart would work well on say a dashboard. But that said, there's no doubt in my mind that this chart is sexier than my original. Might look nice in the Economist.

          • I can not stop thinking about ... and to try!
            Thanks Jeff, and thanks to Jon because I like all of this, and the discussion is a good source of inspiration (always!)

            Here my new version:
            http://goo.gl/539acQRoberto

          • Jon Peltier says:

            I actually like the gray better than the black. It's more comfortable, like using slightly muted fills on bar and area fills. But if we dispense with the boxes and use a single point (and I'd use a much smaller marker for it, 5 pts at most), we can go back to a white background, which is also my favorite.

          • Jon Peltier says:

            Jeff's markers and Roberto's latest with lighter fill replacing the white rectangles got me thinking. I came up with two new variations.
             
            Markers denoting averages of each quintile
            http://peltiertech.com/images/2014-01/DistribWithMarkers.pngGraph
             
            Horizontal lines denoting averages of each quintile
            http://peltiertech.com/images/2014-01/DistribWithLines.pngGraph
             
            Both need a label along the bottom, something like "Subscriptions ranked from highest to lowest" (Jeff, your latest says lowest to highest but it's ranked highest to lowest).

          • Jeff
            I like most about your latest version ... However, the position of the points that denote the average value is definitely wrong for the first 2 quartiles

          • Jeff Weir says:

            Yes, you're right Roberto. Partly this is due to an error, but partly due to the chart type as well... unless you're using an XY chart, you can't show the exact point on the edge of the existing graph series where the average occurs, because there is no discrete point (i.e. customer sub) associated with that value. Plotting a horizontal line gets by this, because you can visually see where the line and the original series intersect.

            Hard to explain. I'll fix my error and try this in a scatterplot. That said, I like Jon's line approach.

            I originally tried something similar, using a white line to break each series in half (albeit with the wrong value plotted). Redux_White Line
            But found it visually distracting so went with the point approach instead. But how Jon did it works better.

            God I love the hive mind.

  9. PeterB says:

    Hi Jeff,

    As a data analyst (not a chart guru), I think this post is brilliant. Your chart shows me (and my client) exactly the information I need to provide an overview of customer activity. It is also sufficiently flexible to allow me to adjust as required for various client projects.

    Thank you wholeheartedly,

    Peter

  10. Jon Acampora says:

    Hi Jeff,

    I like your customer segment chart. This is a great way to show a distribution while not summarizing any of the detail. I recently did a similar project where I used quartile plots and histograms. These both do a great job of summarizing a large amount of data, but they are also difficult for the reader to comprehend quickly. Especially the quartile plot. It takes time to explain if the reader is not familiar with quartiles and usually just confuses them.

    I think your segmentation chart is simple and easy to comprehend, and that is very important when it comes to visualization.

    Thanks for sharing!

  11. Suril says:

    awesome post jeff!

  12. Johnny says:

    hi Chandoo, great Chart,

    as you have done it, that the area so just going down?

  13. Johnny says:

    I've seen the chart at the top, have downloaded it and wanted to play.
    As I have seen it is a AreaChart and I do not quite like the area so just goes down as if it is cut off, I get it simply go not, can someone help me?

    Johnny

    • Jeff Weir says:

      What version of Excel do you have?
      What kind of chart type are you trying to change it to?
      Can you take a screenshot, and post it somewhere then put the link here, so we can see what result you are getting?

  14. Johnny says:

    Excel 2010

    I can make the screenshot and send this via mail

    Johnny

  15. Johnny says:

    send out!

    Johnny

  16. […] Did Jeff just chart? | Chandoo […]

  17. Johnny says:

    no, sorry

    Johnny

  18. […] here. You might remember me from shows such as Handle volatile functions like they are dynamite, Did Jeff just Chart, and Robust Dynamic (Cascading) Dropdowns Without […]

  19. Fredrik says:

    Hi - great way of presenting customer data! Is itt possible to download the template for "Update 1". Can't find a link...
    /fredrik

  20. Anthony Smith says:

    Hello,

    I really like the chart I have added some data into the table roughly 2,883 records of which 2,167 fall into the microscopic amount but its forcing the right hand side of the graph to have less pop.

    How did you flip the area for the larger customers to be on the left side?

    Any suggestion on how to make the larger segments more visiable and keeping the smaller guys in as well?

    Thanks,
    Tony

    • Jeff Weir says:

      Hi Anthony. Glad you like it. From memory I went Format Axis>Categories In Reverse Order. Did this a while ago and have forgotten the specifics.

      I'll upload a sample file with the right-to-left ordering shortly, so you can have a poke around.

      If you can't fit all the data on one chart and get the message across, then try two charts - one above the other, with big and medium customers in one and small in the other.

      • Anthony Smith says:

        Thanks Jeff, I did the Format Axis>Categories In Reverse Order; and it goes into the upper right hand corner.

        Thanks for you reply great tool....

        • Hui... says:

          @Anthony
          It sounds as if you have Reversed the Vertical Axis
          Try Reversing the Horizontal Axis or the one you didn't change last time

          • Jeff Weir says:

            Thanks Hui. @Anthony...it's actually quite tricky to reverse the axis in my example, because that axis is hidden. Or rather, effectively there IS no Axis, meaning you can't get to the 'Categories in Reverse Order' option. What you have to do is actually add an axis, then select it and right click on it, then choose the Format Axis option. Then check/uncheck the 'Categories in Reverse Order' option as appropriate, and then delete the axis. Then go have a lie down. 🙂

  21. Jessica C says:

    What would be the proper method for reducing the number of segments, I'd like to look at only 3 or 4. Thanks!

    • Jeff Weir says:

      Jessica: Just resize the table to exclude the rows at the bottom that you want to ignore, and then change the figures in the 'Break point' column into whatever groups you desire. e.g. if you wanted three even groups, you'd resize the table so that it cut off the last two rows, and you'd change the 20%, 40% and 60% figures to 33%, 66%, and 100%

  22. sasha says:

    I'm confused on how you got $34,239 from the 5% breakpoint (time wasters). What formula was used to calculate this?

Leave a Reply