• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

A new method to Distinct Values

bines53

Active Member
Hello friends,

I found another way, better than:
=SUM(--(FREQUENCY(A2:A31,A2:A31)>0))
=SUMPRODUCT(--(MATCH($A$2:A31,$A$2:A31,0)=ROW($A$2:A31)-1))
or
{=SUM(--(MATCH($A$2:A31,$A$2:A31,0)=ROW($A$2:A31)-1)) }
100,000 lines =0.04 seconds.

Now I need to find a solution to Unique Values.
If I find, of course I will update.

Regards

David
 

Attachments

  • test.1234.xlsx
    8 KB · Views: 23
Last edited by a moderator:
Two solutions to count the Number of Unique items
=SUM(--(FREQUENCY(A2:A31,A2:A31)=1))
=SUM(--IF(COUNTIF(A2:A31,A2:A31)=1,TRUE,)) Ctrl+Shift+Enter
 
Hi Hui,

I know these formulas, a large range this poor choice,
the best way to be with the function MATCH.


Regards
 
SUM+MATCH+CSE method took just under a minute (58.3s) for 100,000 values for me. But it took just a fraction of a second (0.06s) to do the same calculation with sorted data and changing the last argument of MATCH from 0 to 1:

{=SUM(--(MATCH($A$2:A100001,$A$2:A100001,1)=ROW($A$2:A100001)-ROW($A$1)))}

So if you have large volumes of data it's really worth sorting first - if you can!
 
@Lori

I think (based on my exp. with David) sorting will be a problem for him :) (David don't take it otherwise). In my viewpoint generating a PT and than simply using a COUNT will be fastest.

Regards,
 
Hi Lori,

I have to say, that all the tests I did, also the previous thread, the numbers are not sorted, from 1 to 9, I'll do a test with numbers from 1 to 100, later.

Regards,
 
@Somendra Misra: agreed - PT or SELECT DISTINCT query would be my preferred option for this. I believe 2013 version now has distinct count option for this in the PT data model.

I haven't been frequenting this forum much so don't know who's who, but usually take the view that responses should be useful to users generally and not necessarily tailored toward individual user preferences - but thanks for the heads up :)
 
@Lori

No issue there are few users which come with their user preference, so for them will have to keep in mind :). Personally I would also do a PT with Table as source and may be a small macro to refresh PT to save some steps to select the option using a mouse :)

Regards,
 
Hi Lori,

You're right,
As the values are different in range, especially the range is 100,000 lines, computing speed will be slower.

The function COUNTDIFF ,Do it the fastest, in any situation.

Regards,

David
 
I would like to note, that the formula
{=SUM(--(MATCH($A$2:A31,$A$2:A31,0)=ROW($A$2:A31)-1)) },
Is also working on a text, and a combination of numbers with text in the same range.

Regards,
 
@bines53. Yes, an xll library function like COUNTDIFF will likely be quickest generally - although not so easy to distribute - similar findings can be found on the fastexcel site.

But i find the FREQUENCY formula is 5-10 times quicker on my set up than the MATCH CSE alternative for a larger number of rows (9.1s for 100000 rows Office 2013 64-bit). As you say, MATCH has the benefit that it also allows text.
 
Hello friends,

With the data, I attached file, what is the solution to Unique Values.
With the function COUNTDIFF ,the answer should be 1.

Thank you!
 
Didn't Hui already give two formulas for counting unique values above?
(I think you might have missed use of "=1" instead of ">0" in those formulas.)
 
Back
Top