• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Formula Challenge 008 - return distinct/duplicate /unique strings as an array

jeffreyweir

Active Member
Formula Challenge 008 - return distinct/duplicate /unique strings as an array


Challenge Description

Good evening, Gentlemen. Your mission, should you choose to accept it, is to:

- Write three formula that respectively return the distinct, duplicate, and unique values of a string array.

- The output of each formula must be an array that can be used directly by other formula (such as CountA etc). That is, we're after a single formula that you can directly incorporate into other formulas - not a formula that has to be entered over a worksheet range in order to work.

- Each formula must return an array of the exact size containing only the distinct, duplicate, and unique values. So as per the sample data and output below:

* The array produced by the Distinct formula will be exactly 11 elements long

* The array produced by the Unique formula will be exactly 8 elements long

* The array produced by the Duplicate formula will be exactly 3 elements long

- Each formula should be completely dynamic, and able to be used on a 1D array of any size.


This message will self-destruct in 10 seconds.

Good Luck


Sample Data

(List of people who have so far posted comments in these Formula Challenges)

Names


[pre]
Code:
jeffreyweir
bobhc
Haseeb A
NARAYANK991
Colin Legg
Colin Legg
Shaun
Hui
Haseeb A
Luke M
Sam Mathai Chacko
Haseeb A
shrivallabha
SirJB7
jeffreyweir
Distinct[/b]

[pre][code]jeffreyweir
bobhc
Haseeb A
NARAYANK991
Colin Legg
Shaun
Hui
Luke M
Sam Mathai Chacko
shrivallabha
SirJB7
Unique[/b]

bobhc
NARAYANK991
Shaun
Hui
Luke M
Sam Mathai Chacko
shrivallabha
SirJB7
[/pre]
Duplicates


jeffreyweir
Haseeb A
Colin Legg[/code][/pre]
 
Here is my attempt Jeff;


Distinct:

Code:
=T(OFFSET(A1,SMALL(IF(ISNUMBER(1/(MATCH(DataText,A:A,0)=ROW(DataText))),ROW(DataText)),
 
ROW(A$1:INDEX(A:A,SUM(ISNUMBER(1/(MATCH(DataText,A:A,0)=ROW(DataText)))+0))))-1,))
 
 
Unique:
 
[code]
 
=T(OFFSET(A1,SMALL(IF(COUNTIF(DataText,DataText)=1,ROW(DataText)),
 
ROW(A$1:INDEX(A:A,SUM(N(COUNTIF(DataText,DataText)=1)))))-1,))


Duplicate:



=T(OFFSET(A1,SMALL(IF(FREQUENCY(IF(COUNTIF(DataText,DataText)>1,

MATCH(DataText,A:A,0)),ROW(DataText)),ROW(DataText)),

ROW(A$1:INDEX(A:A,SUM((FREQUENCY(IF(COUNTIF(DataText,DataText)>1,

MATCH(DataText,A:A,0)),ROW(DataText))>0)+0))))-1,))[/code]


Waiting for more elegant ways...


Haseeb
 
Hi,

My versions, which are similar to Haseeb's:


Unique: =T(OFFSET(A$1,SMALL(IF(COUNTIF(sdata,sdata)=1, ROW(sdata)),ROW(OFFSET(A$1,,,SUM(N(COUNTIF(sdata,sdata)=1)))))-1, 0))


Distinct: =T(OFFSET(A$1,SMALL(IF(FREQUENCY(MATCH(sdata,sdata,0),MATCH(sdata,sdata,0))>=1,ROW(sdata)), ROW(OFFSET(A$1,,,SUM(1/COUNTIF(sdata,sdata)))))-1, 0))


Duplicate: =T(OFFSET(A$1,SMALL(IF(FREQUENCY(MATCH(sdata,sdata,0),MATCH(sdata,sdata,0))>1,ROW(sdata)), ROW(OFFSET(A$1,,,SUM(N(FREQUENCY(MATCH(sdata,sdata,0),MATCH(sdata,sdata,0))>1)))))-1, 0))


Cheers,

Sajan.
 
Here's my attempt. Looks like they are working on the above data at least :D


UNIQUE, DUPLICATE, DISTINCT in that order


=INDEX(List,SMALL(IF(COUNTIF(List,List)=1,ROW(List)),ROW(A1)))


=INDEX(List,SMALL(IF(FREQUENCY(MATCH(List,List,0),MATCH(List,List,0))>1,ROW(List)),ROW(A1)))


=INDEX(List,SMALL(IF(FREQUENCY(MATCH(List,List,0),MATCH(List,List,0))>0,ROW(List)),ROW(A1)))
 
Good one, Sajan. Challenge 6 is beckoning you now... http://chandoo.org/forums/topic/formula-challenge-006-whats-the-number-you-can-confirm-is-a-prime-or-not


Sam...great attempt, but unfortunately you missed one of the key criteria above:

- The output of each formula must be an array that can be used directly by other formula (such as CountA etc). That is, we're after a single formula that you can directly incorporate into other formulas - not a formula that has to be entered over a worksheet range in order to work.


INDEX doesn't let you do this, I'm afraid.
 
Acknowledged Jeff. Bad miss there.


Here's the revised ones.


=T(OFFSET($A$1,SMALL(IF(COUNTIF(List,List)=1,ROW(List)),ROW(List))-1,))


=T(OFFSET($A$1,SMALL(IF(FREQUENCY(MATCH(List,List,0),MATCH(List,List,0))>1,ROW(List)),ROW(List))-1,))


=T(OFFSET($A$1,SMALL(IF(FREQUENCY(MATCH(List,List,0),MATCH(List,List,0))>0,ROW(List)),ROW(List))-1,))
 
Apparently, that one didn't fully do it, did it? And I was reckless enough to pressume I nailed it.


So without much flash, here's what I think is the one that'll work (yeah, not taking much credit, cause it almost resembles what Sajan did (if not replicates).


=T(OFFSET(A$1,SMALL(IF(COUNTIF(List,List)=1,ROW(List)),ROW(INDIRECT("1:"&SUM(--(COUNTIF(List,List)=1)))))-1,))


=T(OFFSET(A$1,SMALL(IF(FREQUENCY(MATCH(List,List,0),MATCH(List,List,0))>1,ROW(List)),ROW(INDIRECT("1:"&SUM(--(FREQUENCY(MATCH(List,List,0),MATCH(List,List,0))>1)))))-1,))


=T(OFFSET(A$1,SMALL(IF(FREQUENCY(MATCH(List,List,0),MATCH(List,List,0))>0,ROW(List)),ROW(INDIRECT("1:"&SUM(--(FREQUENCY(MATCH(List,List,0),MATCH(List,List,0))>0)))))-1,))
 
Okay, after replacing all range names with a one digit name q, here's the results:


Haseeb wins the UNIQUE category with this 93 character long puppy:

=T(OFFSET(A1,SMALL(IF(COUNTIF(q,q)=1,ROW(q)),ROW(A$1:INDEX(A:A,SUM(N(COUNTIF(q,q)=1)))))-1,))


(Sam and Sajan had 94 and 96 characters respectively)


Sajan wins the DISTINCT category in 117 characters:

=T(OFFSET(A$1,SMALL(IF(FREQUENCY(MATCH(q,q,0),MATCH(q,q,0))>0,ROW(q)), ROW(OFFSET(A$1,,,SUM(1/COUNTIF(q,q)))))-1, 0))


(Haseeb and Sam had 132 and 142 characters respectively)


Sam
wins the DUPLICATE
category in 142 characters:

=T(OFFSET(A$1,SMALL(IF(FREQUENCY(MATCH(q,q,0),MATCH(q,q,0))>1,ROW(q)),ROW(INDIRECT("1:"&SUM(N(FREQUENCY(MATCH(q,q,0),MATCH(q,q,0))>1)))))-1,))


(Haseeb and Sajan had 170 and 144 characters respectively)


Gentlemen, you are all winners in my book!


My formulas were very long indeed:

Distinct (181):

=T(OFFSET(q,-1+SMALL(IFERROR(MATCH(IF(MATCH(q,q&"",0)=ROW(q)-MIN(ROW(q))+1,q),q,0),""),ROW(INDEX(D:D,1):INDEX(D:D,COUNT(MATCH(IF(MATCH(q,q&"",0)=ROW(q)-MIN(ROW(q))+1,q),q,0))))),0))


Unique (183):

=T(OFFSET(q,-1+SMALL(IFERROR(MATCH(q,REPT(q,MMULT(N(q=TRANSPOSE(q)),ROW(q)^0)=1),0),""),ROW(INDEX(A:A,1):INDEX(A:A,COUNT(MATCH(q,REPT(q,MMULT(N(q=TRANSPOSE(q)),ROW(q)^0)=1),0))))),0))


Duplicate (249):

=T(OFFSET(q,-1+SMALL(IFERROR(MATCH(REPT(q,(MMULT(N(q&""=TRANSPOSE(q)),ROW(q)^0)>1)*(ROW(q)-MIN(ROW(q))+1)=MATCH(q,q,0)),q,0),""),ROW(INDEX(A:A,1):INDEX(A:A,SUM(N(MATCH(q,q,0)=((MMULT(N(q&""=TRANSPOSE(q)),ROW(q)^0)>1)*(ROW(q)-MIN(ROW(q))+1))))))),0))
 
Whoops, found some blanks in Sajan's formula, which pushed his length out. Plus managed to remove replace quite a few instances of comma zero with just comma for MATCH and OFFSET functions across all of these. Plus I've taken out the mixed references in several of the formulas from all of you, as they are not needed.


Also using t for Text as named range


UNIQUE

Sajan(91):

=T(OFFSET(A1,SMALL(IF(COUNTIF(t,t)=1,ROW(t)),ROW(OFFSET(A1,,,SUM(N(COUNTIF(t,t)=1)))))-1,))


Haseeb (92):

=T(OFFSET(A1,SMALL(IF(COUNTIF(t,t)=1,ROW(t)),ROW(A1:INDEX(A:A,SUM(N(COUNTIF(t,t)=1)))))-1,))


Sam (93):

=T(OFFSET(A1,SMALL(IF(COUNTIF(t,t)=1,ROW(t)),ROW(INDIRECT("1:"&SUM(N(COUNTIF(t,t)=1)))))-1,))


DISTINCT

Sajan (111):

=T(OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>0,ROW(t)),ROW(OFFSET(A1,,,SUM(1/COUNTIF(t,t)))))-1,0))


Haseeb (133):

=T(OFFSET(A1,SMALL(IF(ISNUMBER(1/(MATCH(t,A:A,0)=ROW(t))),ROW(t)),ROW(A1:INDEX(A:A,SUM(ISNUMBER(1/(MATCH(t,A:A,0)=ROW(t)))+0))))-1,))


Sam (137):

=T(OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>0,ROW(t)),ROW(INDIRECT("1:"&SUM(N(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>0)))))-1,))


DUPLICATE


Sajan (135):

=T(OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>1,ROW(t)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>1)))))-1,))


Sam (137):

=T(OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>1,ROW(t)),ROW(INDIRECT("1:"&SUM(N(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>1)))))-1,))


Haseeb (169):

=T(OFFSET(A1,SMALL(IF(FREQUENCY(IF(COUNTIF(t,t)>1,MATCH(t,A:A,0)),ROW(t)),ROW(t)),ROW(A1:INDEX(A:A,SUM((FREQUENCY(IF(COUNTIF(t,t)>1,MATCH(t,A:A,0)),ROW(t))>0)+0))))-1,))
 
Oh! Jeff. Didn't know about the length :)


Could you please check the formulas? Distinct & Duplicate will give error, if there is at least one empty cell in the range.
 
Haseeb: re Could you please check the formulas? Distinct & Duplicate will give error, if there is at least one empty cell in the range.


Correct, but that's not part of the challenge. So assume no blanks, no letters. Which is exactly how the data is that inspired this challenge.
 
Hi Jeff,


Here is my take on returning a “usable” list of Distinct items in a list – 93 Characters !!

[pre]
Code:
=T(OFFSET(A1,SMALL(IF(MATCH(t,t,0)=ROW(t),ROW(t)),ROW(OFFSET(A1,,,SUM(1/COUNTIF(t,t)))))-1,))
[/pre]
Cheers

Sam
 
Awesome, Sameer. I was wondering when you were going to pop out of the woodwork!


You can go one even shorter by ditching the 0 in the 3rd argument of MATCH (i.e. just have a comma, but no zero):

=T(OFFSET(A1,SMALL(IF(MATCH(t,t,)=ROW(t),ROW(t)),ROW(OFFSET(A1,,,SUM(1/COUNTIF(t,t)))))-1,))
 
Hi All,
I recently had to make use of the formula in Challenge008 – Return Usable list of Distinct items

=T(OFFSET(A1,SMALL(IF(MATCH(t,t,)=ROW(t),ROW(t)),ROW(OFFSET(A1,,,SUM(1/COUNTIF(t,t)))))-1,))

It works fine in most cases but fails due to the sensitivity of the OFFSET's Integer parameters on certain data sets that result when irrational fractions are summed

For Example 1/3 + 1/3 + 1/3 = 1 but when you try and do something like this using floating point arithmetic it could lead to non integer results, which the Integer parameters of Offset tend to ROUNDDOWN

It could lead to wrong results and several frustrating hours of debugging

Regards
Sam
 

Attachments

  • Sensitivity of Offset Parameters.xlsx
    10.3 KB · Views: 18
Back
Top