Hi folks,
For a long time I have been using a simple formula that helps me detect and filter out duplicate records as in a accounts payable (duplicate invoices or payments) or an accounts receivable file with duplicate invoices or amounts, etc. The formula I have been using is: IF(COUNTIF(B:B,B2)>1,"Duplicate","") or IF(COUNTIF($B$2:$B$200,B2)>1,"Duplicate",""). The problem I have is that this formula eats up memory or computing resources calculating large volumes of data as would be with records upwards of 15thousands raws. Forget larger files, I usually leave my laptop working the numbers overnight!!! Yea guys, you see where I am going with this?
QUESTION (1): So, I was wondering if anyone out there or even Chandoo himself could tackle this problem. How should I find and isolate duplicate transactions - faster and efficiently?
QUESTION (2): Iam curious to know if anyone knows how to determine statistical computations such as the famous Z-STAT AND CHI-SQUARE? I own an old mac laptop loaded with Excel for mac 2011 and it does not have the MS Excel addin. Does any one know how to address this?
RESOURCES: For your reference I have uploaded a file MediaFire.com (see link below) with the above formulas and fake data in (Chandoo_Test (Duplicate and Stats).xls) so that anyone out there can play around and test it. The test data is located on columns A through G, my duplicate results are in columns H through J. Finally, my statistical questions (if anyone knows how to find these stats would go on columnsK and L.
Chandoo_Test (Duplicate and Stats).xls
http://www.mediafire.com/?4bqckej5088snx1
I will greatly appreciate your help, Cheers
persol
For a long time I have been using a simple formula that helps me detect and filter out duplicate records as in a accounts payable (duplicate invoices or payments) or an accounts receivable file with duplicate invoices or amounts, etc. The formula I have been using is: IF(COUNTIF(B:B,B2)>1,"Duplicate","") or IF(COUNTIF($B$2:$B$200,B2)>1,"Duplicate",""). The problem I have is that this formula eats up memory or computing resources calculating large volumes of data as would be with records upwards of 15thousands raws. Forget larger files, I usually leave my laptop working the numbers overnight!!! Yea guys, you see where I am going with this?
QUESTION (1): So, I was wondering if anyone out there or even Chandoo himself could tackle this problem. How should I find and isolate duplicate transactions - faster and efficiently?
QUESTION (2): Iam curious to know if anyone knows how to determine statistical computations such as the famous Z-STAT AND CHI-SQUARE? I own an old mac laptop loaded with Excel for mac 2011 and it does not have the MS Excel addin. Does any one know how to address this?
RESOURCES: For your reference I have uploaded a file MediaFire.com (see link below) with the above formulas and fake data in (Chandoo_Test (Duplicate and Stats).xls) so that anyone out there can play around and test it. The test data is located on columns A through G, my duplicate results are in columns H through J. Finally, my statistical questions (if anyone knows how to find these stats would go on columnsK and L.
Chandoo_Test (Duplicate and Stats).xls
http://www.mediafire.com/?4bqckej5088snx1
I will greatly appreciate your help, Cheers
persol