Figuring out Employee Churn with Power Query [HR Analytics]

Posted on March 13th, 2017 in Power Query - 22 comments

Let’s say you are the people manager at ACME Inc. You are looking staff list for the months – January and February 2017. You see that we had 4,000 employees in Jan and 4,200 employees in Feb. So what is the churn?

  • Is it just 200?
  • Or is it the sum of people who left and who joined?
  • What if you want to find out how many people moved to new designations / departments or groups?

You see, churn is tricky to figure out.

So why not invite the pros? ie Power Query. 

employee-churn-analysis-pq

Side note: You could also calculate churn in Excel (using formulas) or in SQL (by using long queries and unions with a boat load of joins)

Setting up your data – Churn analysis

The first step is to set up two sets of data (one for each point in time). Make sure that you include only relevant columns. Let’s say we go with below layout.

employee-churn-analysis-data

For the sake of simplicity, let’s call these tables thismonth and lastmonth. You can include the data date columns, but they are not necessary for the analysis.

Define churn

We can categorize churn in to one of these 6 levels.

  1. New employees
  2. Exits
  3. People who moved to a new group
  4. People who moved to a new branch
  5. People who moved to a new designation
  6. No changes

Figuring out the churn – Power Query

Here is the process to calculate the churn using Power Query.

  1. We create a third query by merging two datasets (thismonth and lastmonth) on employee number as Full Outer join (think of this as A union B in sets – ie any employee present in either months will be included)
  2. We define a custom column in this new query, called status. It will have
    1. New employees if emp number is null in last month’s column
    2. Exit if emp number is null in this month’s column
    3. New group if thismonth.group <> lastmonth.group
    4. New Branch if thismonth.branch <> lastmonth.branch
    5. New Designation if thismonth.designation <> lastmonth.designation
    6. No changes else
  3. We delete all rows with no changes (as we are only interested in churn)
  4. Load this data to Excel

Figuring out employee churn – Power Query lesson video

Since the process is somewhat technical and confusing, I made a video explaining everything in detail. Check it out below.

 

You can watch this video on our youtube channel too.

Download Example Workbook

Click here to download example workbook. You must have Power Query 2013 or Excel 2016 to customize anything.

More on Power Query

Power Query is awesome. If you haven’t explored its power, check out below tutorials and get started.

How do you calculate churn?

As mentioned earlier, we can use either Excel formulas or SQL to calculate churn. If I am only interested in high-level churn (ie entries and exits), I use a simple formula. But for anything more than that, I prefer Power Query or SQL (as PQ is not be available in all versions of Excel)

What about you? How do you calculate and analyze churn? Please share your approach and tips in the comments.

 

Written by Chandoo
Tags: , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

22 Responses to “Figuring out Employee Churn with Power Query [HR Analytics]”

  1. MF says:

    Awesome & Practical! 🙂

  2. Abhay says:

    Hi Chandoo,

    This really awesome example of showing how to use conditional column feature in power query with manually text and columns together.

    One more suggestion, using power query group feature you can further count as well how many employees exit, newly joined, changed group / designation, etc.

    I had launched my own Power Query course here -

    https://www.udemy.com/power-query-training-for-excel-2010-2013-2016-powerbi

    I would love to hear from you.

    Thanks always.

    • Carlos Barboza says:

      I enrolled in your course on Udemy, highly recomended!!!

    • Carlos Barboza says:

      Hi Chandoo, and dear comunity, how about creating four pivot tables, after having calculated with employee, group, branch and designation status "conditional column"? I proceeded with doing it, and had differente results for each pivot table, with having a consistent percentage on the Exit and Recruits.

  3. Jitendra Sharma says:

    Hi Everyone,

    Anybody can help me because I need full form of PIVOT.

    Please reply as soon as.

    Thanks

    Jitendra Sharma

  4. Jay says:

    I don't think creating ONE status column is a good idea...what if group AND designation changed? Your method only shows one status change. It's better to create columns for different status then unpivot those columns to get the percentage.

    • William says:

      I guess Chandoo has mentioned the status change is hierarchical in the sense that a group change is on a higher level as compared to a branch change...
      But yes, both changes can overlap

  5. David N says:

    I agree with Jay because there are some employees with multiple changes. Take employee number 1100018 as an example; their Branch and Designation have both changed, but they are only listed as having a New Branch.

    However, I also disagree (slightly) with Jay in that this can still be solved with a single status column. The number of possible outcomes is small enough to work as a binary approach without making the formula annoyingly long or having to bring a VBA custom function into the mix. If the combined and de-duped list of employee numbers were placed in column M of the example file starting in row 6, then the following could be placed in cell N6 and copied down.

    =IF(ISNA(MATCH(M6,thismonth[Emp num],0)),"Exit",IF(ISNA(MATCH(M6,lastmonth[Emp num],0)),"New Recruit",CHOOSE(SUMPRODUCT(--(INDEX((thismonth[Group],thismonth[Branch],thismonth[Designation]),MATCH(M6,thismonth[Emp num],0),,N(IF(1,{1,2,3})))INDEX((lastmonth[Group],lastmonth[Branch],lastmonth[Designation]),MATCH(M6,lastmonth[Emp num],0),,N(IF(1,{1,2,3})))),{1,2,4})+1,"No Change","New Group","New Branch","New Group & New Branch","New Designation","New Group & New Designation","New Branch & New Designation","New Group, New Branch, and New Designation")))

    The Exits and New Recruits are handled by looking for simple MATCHes that result in #N/A errors (i.e. the mismatches). With those out of the way, the remaining checks are handled by comparing INDEX/MATCH results across multiple columns simultaneously. Credit for the technique of returning multiple results from a single INDEX goes to the author of the EXCELXOR blog and to Jeff Weir for spreading the word of its existence (links below). The SUMPRODUCT aids in some binary math, and CHOOSE then uses that binary sum to select the correct Status combination.

    http://excelxor.com/2014/09/05/index-returning-an-array-of-values/
    http://dailydoseofexcel.com/archives/2014/11/20/an-index-of-insights-from-excelxor/

    Note also that the formula addresses each column of the source tables separately in the array argument of the INDEX function as opposed to using the [#Data] syntax -- e.g. thismonth[#Data]. The following formula demonstrates how that approach would work and is slightly shorter, but it would be susceptible to any columns being added or rearranged in the tables (as in it would break for the same reason VLOOKUPS are so easily broken).

    =IF(ISNA(MATCH(M6,thismonth[Emp num],0)),"Exit",IF(ISNA(MATCH(M6,lastmonth[Emp num],0)),"New Recruit",CHOOSE(SUMPRODUCT(--(INDEX(thismonth[#Data],MATCH(M6,thismonth[Emp num],0),N(IF(1,{2,3,4})))INDEX(lastmonth[#Data],MATCH(M6,lastmonth[Emp num],0),N(IF(1,{2,3,4})))),{1,2,4})+1,"No Change","New Group","New Branch","New Group & New Branch","New Designation","New Group & New Designation","New Branch & New Designation","New Group, New Branch, and New Designation")))

    The difference is subtle, but this approach uses the N-IF combination to control the column_num argument of the INDEX instead of controlling the area_num argument as it did in the first solution. Handling things via area_num ensures that the formula can keep track of the focal columns if something is inserted or rearranged.

    • David N says:

      I forgot about how this site interprets greater/less than symbols, so there was a "does not equal" missing from each of my formulas. Here are the corrected versions.

      =IF(ISNA(MATCH(M6,thismonth[Emp num],0)),"Exit",IF(ISNA(MATCH(M6,lastmonth[Emp num],0)),"New Recruit",CHOOSE(SUMPRODUCT(--(INDEX((thismonth[Group],thismonth[Branch],thismonth[Designation]),MATCH(M6,thismonth[Emp num],0),,N(IF(1,{1,2,3})))&lt&gtINDEX((lastmonth[Group],lastmonth[Branch],lastmonth[Designation]),MATCH(M6,lastmonth[Emp num],0),,N(IF(1,{1,2,3})))),{1,2,4})+1,"No Change","New Group","New Branch","New Group & New Branch","New Designation","New Group & New Designation","New Branch & New Designation","New Group, New Branch, and New Designation")))

      =IF(ISNA(MATCH(M6,thismonth[Emp num],0)),"Exit",IF(ISNA(MATCH(M6,lastmonth[Emp num],0)),"New Recruit",CHOOSE(SUMPRODUCT(--(INDEX(thismonth[#Data],MATCH(M6,thismonth[Emp num],0),N(IF(1,{2,3,4})))&lt&gtINDEX(lastmonth[#Data],MATCH(M6,lastmonth[Emp num],0),N(IF(1,{2,3,4})))),{1,2,4})+1,"No Change","New Group","New Branch","New Group & New Branch","New Designation","New Group & New Designation","New Branch & New Designation","New Group, New Branch, and New Designation")))

      • David N says:

        Well that still didn't get it. My apologies. But at least you can now see the "&lt&gt" that are in the positions where each "does not equal" should be placed.

  6. SDV says:

    Long time reader, first time commenter, because hey, Chandoo is discussing HR data! Please do more of this, many of us in HR use excel every day, without much technical knowledge. This means a fanatic devotion to vlookup, pivot tables, and count functions. 🙂

    I discovered Power Query very recently. I create two queries with last month & this month employees, merge the two, remove duplicates, and simply filter on 'start date' to filter the new hires out. Since we want to see turnover as a percentage of the base, I divide the number of exits by the average headcount of the period for which I'm doing the calculation.

  7. Julian says:

    David,

    Could you please explain further for me how the SUMPRODUCT function return the value from 1 to 8 as the Index_num to be selected by CHOOSE function? I copied "Emp num" data from thismonth table and pasted onto the column M then inserted your above formula in cell N6 downward, but all returned #value!

    Regards,
    Julian

    • David N says:

      Refer back to my post to see an update due to how this site interprets greater/less than symbols. In this case, those two characters should have formed a "does not equal" in the middle of each formula. With that correction made, the formulas should work.

      As for how the SUMPRODUCT feeds the index_num argument, each of the three inequality tests (i.e. the comparison of each column) returns a true or false which Excel will interpret as a 1 or 0 with the help of the double negative sign. The 1 and 0 values are then multiplied by their corresponding binary digits provided by the {1,2,4} array constant and then summed -- the SUMPRODUCT takes care of both operations.

      So if we have three comparisons that return true/false/true, then that's 1/0/1 multiplied by 1/2/4, which gives 1/0/4 for a sum of 5. Since CHOOSE does not like an index_num of zero, the +1 makes sure every possible outcome is a non-zero number. Then the possible combinations were just hard-coded into the subsequent CHOOSE arguments.

      • Julian says:

        Have you tried to puting your formula in cell N6 given the Emp num "A1100001" in cell M6 then changed the value from cell I6~K6 to see the results? I still got the return value "#value!"

        • David N says:

          None of the Emp num values start with an "A" which could be part of your trouble. So you would use "1100001" in cell M6 instead. My only other theory would be that your Windows global setting are such that Excel formulas would use semicolons instead of commas to separate the various arguments.

          Lastly, I'm no HTML expert, but I think my previous attempt at fixing the display of my formula needed a semicolon after each of the two tags. So I'm trying one last time here to get the formula to properly display as it should appear in Excel...hopefully it will work.

          =IF(ISNA(MATCH(M6,thismonth[Emp num],0)),"Exit",IF(ISNA(MATCH(M6,lastmonth[Emp num],0)),"New Recruit",CHOOSE(SUMPRODUCT(--(INDEX((thismonth[Group],thismonth[Branch],thismonth[Designation]),MATCH(M6,thismonth[Emp num],0),,N(IF(1,{1,2,3})))<>INDEX((lastmonth[Group],lastmonth[Branch],lastmonth[Designation]),MATCH(M6,lastmonth[Emp num],0),,N(IF(1,{1,2,3})))),{1,2,4})+1,"No Change","New Group","New Branch","New Group & New Branch","New Designation","New Group & New Designation","New Branch & New Designation","New Group, New Branch, and New Designation")))

          • Julian says:

            With your latest script for the formula it works properly for all the possible scenarios. David, Thank you very much for your coaching and sorry for bothering you so much. By the way, I was modifying the power query as below. It's also workable.

            let
            Source = Table.NestedJoin(thismonth,{"Emp num"},lastmonth,{"Emp num"},"NewColumn",JoinKind.FullOuter),
            #"Expanded NewColumn" = Table.ExpandTableColumn(Source, "NewColumn", {"Emp num", "Group", "Branch", "Designation"}, {"Emp num.1", "Group.1", "Branch.1", "Designation.1"}),
            #"Renamed Columns" = Table.RenameColumns(#"Expanded NewColumn",{{"Emp num.1", "lm.Emp num"}, {"Group.1", "Group.lm"}, {"Branch.1", "Branch.lm"}, {"Designation.1", "Designation.lm"}}),
            #"Added Conditional Column" = Table.AddColumn(#"Renamed Columns", "Status",
            each if [Emp num] = [lm.Emp num] and [Group] [Group.lm] and [Branch] [Branch.lm] and [Designation] [Designation.lm] then "All Changed"
            else if [Emp num] = [lm.Emp num] and [Group] [Group.lm] and [Branch] [Branch.lm] then "New G+B"
            else if [Emp num] = [lm.Emp num] and [Group] [Group.lm] and [Designation] [Designation.lm] then "New G+D"
            else if [Emp num] = [lm.Emp num] and [Branch] [Branch.lm] and [Designation] [Designation.lm] then "New B+D"
            else if [Emp num] = null then "Exit"
            else if [lm.Emp num] = null then "New Recruit"
            else if [Group] [Group.lm] then "New Group"
            else if [Branch] [Branch.lm] then "New Branch"
            else if [Designation] [Designation.lm] then "New Designation"
            else "No changes" )
            in
            #"Added Conditional Column"

  8. Luvkumar says:

    awesome Post 🙂

  9. Karlo says:

    Thanks for the great HR tool Chandoo. I've been waiting a long time for HR applications

  10. Den says:

    Hello! Thanks for this post. I hope this is only the beginning of the work of HR-themed. 🙂 This is practically not a plowed field. In my work I prefer to use DAX, because for such calculations it fits better. In particular, it allows to add dynamically calculated indicators for the described groups. For example, the turnover, depending on the length of seniority in the company given by several intervals - up to 3 months, 3 months to 1 year, 1 to 3 years.
    Here I describe some of my mesures:
    https://community.powerbi.com/t5/Desktop/Calculate-Employees-Turnover/m-p/42037/highlight/false#M15989

  11. GraH says:

    Great post mister Chandoo! Loving the critical reactions and alternative views on how to look at churn. Really insightfull. Thx guys (and girls?)

Leave a Reply