• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Removing duplicates using dax measures

I have a file named receipt data with duplicates receipt no. Sample data



RECEIPT_NOBPIS_NOBPIS_VALUE_DATE
C0597141800083SBIN61812736153407-May-18
C04760418013813686790023-Apr-18
C04760418013793686790023-Apr-18
C04760418013813686790023-Apr-18
C04760418013793686790023-Apr-18


I know that we can remove duplicates using power query – simple step.



But I cannot perform the simple step due to the large data model.



I compared the query loading to data model with and without removing duplicates.



It takes hell lot of time when the data is loaded to power query after removing duplicates in power query.



Data loads faster in power pivot if removing duplicates steps not performed in power query.



So after loading the receipt data containing duplicates in power pivot, I created a distinct table with unique receipt no.



RECEIPT_NO
C0597141800083
C0476041801381
C0476041801379


Now I don’t know how to get the additional columns of receipt data in the distinct table. I tried to do with related function but it does not work.



Output required.



RECEIPT_NOBPIS_NOBPIS_VALUE_DATE
C0597141800083SBIN61812736153407-May-18
C04760418013813686790023-Apr-18
C04760418013793686790023-Apr-18
 
Please find attached the sample file
 

Attachments

  • Distinct receipt.xlsx
    8 KB · Views: 3
  • Output.xlsx
    8.1 KB · Views: 3
  • Receipt dump.xlsx
    8.1 KB · Views: 5
Here is some VBA that may work for you.
Code:
Option Explicit

Sub DeleteDuplicateRows()
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' DeleteDuplicateRows
' This will delete duplicate records, based on the Active Column. That is,
' if the same value is found more than once in the Active Column, all but
' the first (lowest row number) will be deleted.
'
' To run the macro, select the entire column you wish to scan for
' duplicates, and run this procedure.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    Dim r As Long
    Dim n As Long
    Dim v As Variant
    Dim rng As Range

    On Error GoTo EndMacro
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual


    Set rng = Application.Intersect(ActiveSheet.UsedRange, _
                                    ActiveSheet.Columns(ActiveCell.Column))

    Application.StatusBar = "Processing Row: " & Format(rng.Row, "#,##0")

    n = 0
    For r = rng.Rows.Count To 2 Step -1
        If r Mod 500 = 0 Then
            Application.StatusBar = "Processing Row: " & Format(r, "#,##0")
        End If

        v = rng.Cells(r, 1).Value
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        ' Note that COUNTIF works oddly with a Variant that is equal to vbNullString.
        ' Rather than pass in the variant, you need to pass in vbNullString explicitly.
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        If v = vbNullString Then
            If Application.WorksheetFunction.CountIf(rng.Columns(1), vbNullString) > 1 Then
                rng.Rows(r).EntireRow.Delete
                n = n + 1
            End If
        Else
            If Application.WorksheetFunction.CountIf(rng.Columns(1), v) > 1 Then
                rng.Rows(r).EntireRow.Delete
                n = n + 1
            End If
        End If
    Next r

EndMacro:

    Application.StatusBar = False
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    MsgBox "Duplicate Rows Deleted: " & CStr(n)


End Sub
 
Just use ODBC.Query to bring in data. Using "Select Distinct...".

Open PQ editor and create blank query. Paste in following, replacing dbq with your file location & defaultdir with path to the folder containing file.

Ex: This is used as source.
Code:
= Odbc.Query("dbq=C:\USERS\USERNAME\FOLDER\Receipt dump.xlsx;defaultdir=C:\USERS\USERNAME\FOLDER;driverid=1046;maxbuffersize=2048;MaxScanRows=1;pagetimeout=5;dsn=Excel Files", "Select Distinct * From [Sheet1$]")

Result:
upload_2018-6-7_17-6-39.png
 
Hi Chihiro i dont know where is ODBC.query in power BI. Please share the screenshot on how to go ODBC.query. Is it separate applicaiton?
 
Hi Sid,

I cannot apply VBA code in power BI. Also applying VBA code in excel will only remove duplicates in one excel file. In this case i have merged multiple excel file and found the duplicates. Also the merge excel file contains 20 lakh rows of data. So applying VBA code will be successful.
 
Hi Chihiro i dont know where is ODBC.query in power BI. Please share the screenshot on how to go ODBC.query. Is it separate applicaiton?
Chihiro did say to open a blank query and then paste the given code + provided screenshot already that in the source step you put that code. Try to apply the steps explained in #4.
 
Hi,

I found another way but not fully.

I have create a calculated column which will countrows that are duplicate.

Count = CALCULATE(countrows(Receiptdump),ALLEXCEPT(Receiptdump,Receiptdump[RECEIPT_NO]))

Receipt no BPIS NO BPIS Date Mapping status Count
C0093521800662 1013x 30-01-2018 MAPPED 3
C0093521800662 1013x 30-01-2018 MAPPED 3
C0093521800662 1013x 30-01-2018 MAPPED 3
C00935218006623 1014x 29-01-2018 MAPPED 2
C00935218006623 1014x 29-01-2018 MAPPED 2
C00935218006625 1015x 28-01-2018 MAPPED 1
C00935218006626 1016x 28-01-2018 MAPPED 1

After a lot of googling, I have found that we cannot delete any records in power pivot.

So I wanted to create a new table where Count =1. This will throw me distinct values of receipt no.

The objective is to create a relationship with source table by receipt no.

Receipt no BPIS NO BPIS Date Mapping status Count
C00935218006625 1015x 28-01-2018 MAPPED 1
C00935218006626 1016x 28-01-2018 MAPPED 1
 
Hi I want to mention that there are also samples where receipt no is same but the BPIS no in the table are different.

Receipt no BPIS NO BPIS Date Mapping status Count
C00935218006625 1015x 28-01-2018 MAPPED 1
C00935218006625 1016x 28-01-2018 MAPPED 1

So if i apply distinct (table) I will not be able to get the distinct table.

And if i apply distinct (table. ([receipt no]) I will get the distinct table but what about other columns how i will map the BPIS No and date to the distinct table.

Please suggest.
 
Back
Top