Removing duplicate data is like morning coffee for us, data analysts. Our day must start with it.
It is no wonder that I have written extensively about it (here: 1, 2, 3, 4, 5, 6, 7, 8).
But today I want to show you a technique I have been using to dynamically extract and sort all unique items from a last list of values using Pivot Tables & OFFSET formula.
This is how it goes…,
Step 1: Select your data & Create a pivot table
Just select any cell and insert a pivot table. Very simple right?
Step 2: Drag the field(s) to row label area of pivot
Like this.
Make sure you have turned off grand totals and sub-totals as we just need the names. And sort the pivot table.
Step 3: Create a named range that refers to the pivot table values
Using OFFSET formula, we can create a named range that refers to pivot table values and grows or shrinks as the pivot is refreshed. Assuming the pivot table row values start in cell F6, write a formula like,
=OFFSET($F$6, 0,0,COUNTA($F:$F)-1,1)
and map it to a name like lstProducts.
The formula gives us all the values in column F, starting F6. The COUNTA($F:$F)-1 ensures that we get only row labels and not the title (in this case Product Names).
Step 4: Use the named range in formulas etc. as you see fit
That is all. Nothing else.
Just make sure that you refresh the pivot table whenever source data changes.
Download example file with this technique
Click here to download an example file and play with it to understand how this works.
How do you deal with duplicate data?
In my work, I come across duplicate data all the time. I have been using pivot table based technique with great success. It is fast, reliable and easy to setup. The only glitch is that you need to refresh the pivot tables whenever source data changes. However, you can automate this by writing a simple macro.
What about you? How do you deal with duplicate data? Share your techniques, tips & ideas using comments.