• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Cleaning data: User completion status

jjacker

Member
Hi, Everyone.

I need help on how to take advantage of Power Query on cleaning data that has multiple attempts in the raw data.

My end goal is to have a cleaned data that only includes distinct users and their completion status. The completion status should show if they have already 'Completed' the course regardless of the number of attempts.

Here's a screenshot of the raw data:
73922

And, a screenshot of the expected output after the data is cleaned.
73923

I have also uploaded a sample sheet for reference.

Apparently, my solution with basic excel formulas is to;
1. Sort Completion Status (A-Z) to have Completed status on top.
2. Copy the User column to a separate sheet then delete duplicates.
3. Lookup the completion status from the raw table (that has the Completion Status already sorted).

I hope someone can help me take advantage of the power of Power Query on cleaning the data and reduce all the manual work, especially when working with millions of rows.

Thanks in advance everyone! :)
 

Attachments

  • CleaningData.xlsx
    9.9 KB · Views: 3
A possible pattern in PQ would be:
Group by name, Course and aggregate all rows. Call this All.
Then add custom List.Contains(All, "Completed").
If #attemps are needed, then in grouping add a max aggregation on that column.
 
Now having time to do it.
For performance reasons it might be a good idea to sort, add index then use grouping with 4rth argument, GroupKind.Local.
 

Attachments

  • Copy of CleaningData.xlsx
    19 KB · Views: 12
Back
Top