• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

longest consecutive sequence of a value

@bines53

Perhaps you would like to post a small file with evidence to support your claim that certain combinations of functions are more efficient than alternatives?

I also work with large datasets, so I would be interested to see the results of the testing you've done in this area, particularly with regards to non-array vs array set-ups.

Regards
 
Hi XOR LX,

A good idea, but unfortunately not possible, the data I get, real-time data on global stock exchanges, shares, options, some formulas, they are written in the language C ++, (xll, formulas that do not exist in Excel. Tests I did, I could see the differences, calculation and sending Command executed.

David
 
But of course it does not have to be that actual data.

Perhaps if you have some time you can prepare a simple Excel sheet with dummy data, in which you present the various formulas alongside the results of testing you have done on those formulas re their efficiency, calculation speed, etc.?

Until that time, I will unfortunately not to be able to take your word that what you believe is in fact the truth, though I hope that you will prove me wrong in this respect.

Regards
 
Hi XOR LX,

Do not have to use my data,Let's take a one million rows, number of columns,
We will use the RANDBETWEEN function,Random numbers,After that, make them permanent numbers, not random, and test each function mentioned here, and others, and compare them, I believe you have reliable tools, and finally we know, which one is better.

Regards
 
But, with all due respect, I am not the one that is making a claim as to one set of constructions being more efficient than the other. It is you who is making that claim.

And I trust you can appreciate the fact that your theory, as well as all the theories that have ever been proposed since the very first day that such a concept was available to mankind, and in whatever field that may have been, cannot be deemed worthy of consideration, even less so of acceptance, unless it is supported by a body of evidence which is sufficient to uphold, beyond all reasonable doubt, the claims inherent in that theory.

And I, sir, will certainly not be doing that work for you. So, until that time as you convince me otherwise, I will continue to believe that there are no particular grounds to support your claims.

Regards
 
Hi XOR LX,

I have no problem doing tests, I'll do it in the coming days, just tell which tool to use,And you want proof that ,=AGGREGATE(14,6,(FREQUENCY((A2:A32=H23)*ROW(A2:A32),(A2:A32<>H23)*ROW(A2:A32)))/(A2:A32<>H23),1) ,It is better than,
{=LARGE(FREQUENCY(IF(A2:A31=D3,ROW(A2:A31)),IF(A2:A31<>D3,ROW(A2:A31))),1)},and SUMPRODUCT vs sum +index ?

Regards
 
Perhaps AGGREGATE is not the most pertinent example. As Narayan points out, one of the advantages of AGGREGATE may be that it can incorporate several functions into one by use of its function_num parameter. And here, of course, the non-AGGREGATE version requires an extra function call (LARGE).

However, that said, I would still be interested to see the results comparing the two versions that you give.

Perhaps you also could perform some tests on SUM vs SUMPRODUCT, e.g.:

=SUM((A1:A100000="A")*(B1:B100000="B")*(C1:C100000="C")*(D1:D100000="D"))

with CSE, vs:

=SUMPRODUCT((A1:A100000="A")*(B1:B100000="B")*(C1:C100000="C")*(D1:D100000="D"))

though note that I think it would be unfair here - and also not entirely relevant to our discussion - to use the "native", i.e. comma-separated form of SUMPRODUCT, i.e.:

=SUMPRODUCT(0+(A1:A100000="A"),0+(B1:B100000="B"),0+(C1:C100000="C"),0+(D1:D100000="D"))

since I understand that this version does offer a small increase in performance (though here too it would be interesting to have that confirmed).

The reason I say unfair is that we wish to perform a test in which all else is equal apart from the fact that one version is being array-entered, the other isn't.

Besides, we are not always in a position to be able to use the native form of SUMPRODUCT, for example when the arrays being passed are not of identical dimensions, a reasonably common scenario.

To me, I cannot believe that there is much difference between the "product" form of SUMPRODUCT above and the equivalent CSE version. However, I will await your results with interest.

Regards
 
Hi XOR LX,

Until tomorrow night, I'll finish the tests I did, I expanded the tests beyond that mentioned in the discussion. I want to ask about the function FREQUENCY, have you noticed ever, there is a crash in Excel, when you combine this function with other functions in a very large range of lines, without cse?

Regards
 
Hi.

Thanks for letting me know.

I hadn't noticed that, no. Are you saying that this issue is particular to FREQUENCY, and does not occur with other functions?

Regards
 
Hi XOR LX,

I also checks, several versions of the formulas about, unique numbers.
FREQUENCY function,working optimally, with an array.
 
Hello friends,

The tests were made hardware ,
Windows 7 64,excel 2010 32 bit
Processor i5 3450.

Tested, 100000 lines ,

=SUM((A1:A100000="A")*(B1:B100000="B")*(C1:C100000="C")*(D1:D100000="D"))

with CSE, vs:

=SUMPRODUCT(0+(A1:A100000="A"),0+(B1:B100000="B"),0+(C1:C100000="C"),0+(D1:D100000="D"))

SUM with CSE 10% faster .

SUMPRODUCT vs sum+index ,Same speed.

FREQUENCY
When the function, with array {},is very fast

{=LARGE(FREQUENCY(IF(A2:A100001=6,ROW(A2:A100001)),IF(A2:A100001<>6,ROW(A2:A100001))),1)}

When the function,with just ENTER,Is the most horribly slow,like
=MAX(--FREQUENCY(A2:A100001,A2:A100001)),or
=SUM(--(FREQUENCY(A2:A100001,A2:A100001)>0))
Also used the method,COUNTIFS(A2:A100001,A2:A100001),Worse.
The combination FREQUENCY and IF ,Is the optimal combination,
Like the XOR LX noted.
Excel solutions for the, Count Of Distinct Values ,and Unique Values,they suck !
Good to have another solution!

AGGREGATE

AGGREGATE(14,6,(FREQUENCY((A2:A32=H23)*ROW(A2:A32),(A2:A32<>H23)*ROW(A2:A32)))/(A2:A32<>H23),1),
Not functioning well, the combination of AGGREGATE and FREQUENCY,in Large range.
I really like the function, although it is not the most feasible with the integration of all other functions.
Easy to use and understand.

Regards
 
Hi David ,

Thanks for the research ; it would help if instead of saying 10% faster , you could give the absolute times for all tests ; after all , 10% faster does not reveal anything.

If a test using a faster function took 3 seconds or 30 seconds , what is 10 % faster really worth ? Is it significant ?

If you ask me , irrespective of the units , 10 % is not really worth striving for , at the expense of clarity and / or maintainability.

Narayan
 
Hi Narayan,

SUM with CSE =0.06 seconds.
SUMPRODUCT =0.067 seconds.

{=LARGE(FREQUENCY(IF(A2:A100001=6,ROW(A2:A100001)),IF(A2:A100001<>6,ROW(A2:A100001))),1)}=0.05 seconds.

AGGREGATE =Number of solutions with the function AGGREGATE (Not with FREQUENCY ) =0.04-0.08 seconds.

As mentioned ,100,000 lines,All tests.

Regards

David
 
Very interesting.

I suspected that there would be little difference between the two, but not that the CSE version would be actually more efficient than the equivalent non-CSE set-up.

Thanks very much for your analysis - it's much appreciated. And I shall be sure to post a link to this page on occasions where this question crops up again on some other forum.

Regards
 
Back
Top