• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Formula Challenge 011 - distinct/duplicate/unique array from text AND numbers

jeffreyweir

Active Member
Okay, so:

* Formula Challenge 007 dealt with extracting an array of distinct/duplicate/unique values from a range containing only numerical values

* Formula Challenge 008 dealt with extracting an array of distinct/duplicate/unique values from a range containing only text values


Can you come up with a formula that extracts an array of distinct/duplicate/unique items from a range containing numerical and
text values?


- You should assign a 1-digit named range to the original data called m (m for mixed), and refer to that within your formula. This makes it easier for me to compare formula lengths and performance on my master spreadsheet.Other than that, you should not use any named ranges or intermediate cells.

- The output of each formula must be an array that can be used directly by other formula (such as COUNTA etc). That is, we're after a single formula that you can directly incorporate into other formulas - not a formula that has to be entered over a worksheet range in order to work.

- Each formula should be completely dynamic, and able to be used on a 1D array of any size.

- Each formula must return an array of the exact size containing only the distinct, duplicate, and unique values. So as per the sample data and output below:

* The array produced by the Distinct formula will be exactly 14 elements long

* The array produced by the Unique formula will be exactly 9 elements long

* The array produced by the Duplicate formula will be exactly 5 elements long

[pre]
Code:
m	Distnct	Unique	Dups
1	1	1	2
2	2	35	45
45	45	22	49
45	49	39	jake
49	35	23	simon
35	jake	Peter	#N/A
jake	22	jeff	#N/A
jake	39	99	#N/A
22	23	0	#N/A
49	simon	#N/A	#N/A
39	Peter	#N/A	#N/A
23	jeff	#N/A	#N/A
simon	99	#N/A	#N/A
Peter	0	#N/A	#N/A
simon	#N/A	#N/A	#N/A
jeff	#N/A	#N/A	#N/A
simon	#N/A	#N/A	#N/A
2	#N/A	#N/A	#N/A
99	#N/A	#N/A	#N/A
0	#N/A	#N/A	#N/A
[/pre]

Note that the previous shortest formulas to return Distinct/Unique/Duplicate arrays for text were as follows:

=T(OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>0,ROW(t)),ROW(OFFSET(A1,,,SUM(1/COUNTIF(t,t)))))-1,0))


=T(OFFSET(A1,SMALL(IF(COUNTIF(t,t)=1,ROW(t)),ROW(OFFSET(A1,,,SUM(N(COUNTIF(t,t)=1)))))-1,))


=T(OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>1,ROW(t)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(t,t,),MATCH(t,t,))>1)))))-1,))


Similarly, the previous shortest formulas to return Distinct/Unique/Duplicate arrays for numbers were as follows:

=SMALL(IF(FREQUENCY(n,n)>=1,n),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(n,n)>=1)))))


=SMALL(IF(FREQUENCY(n,n)=1,n),ROW(OFFSET(K1,,,SUM(N(FREQUENCY(n,n)=1)))))


=SMALL(IF(FREQUENCY(n,n)>1,n),ROW(OFFSET(U1,,,SUM(N(FREQUENCY(n,n)>1)))))
 
The formulas for text shall work for both cases, methinks.:

Distinct:

=INDEX(m,SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>0,ROW(m)),ROWS($A$1:A1))-1)


Unique:

=INDEX(m,SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,))=1,ROW(m)),ROWS($A$1:A1))-1)


Dups:

=INDEX(m,SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1
,ROW(m)),ROWS($A$1:A1))-1)


If I could add anything in the favor of the formula posted by me is then you need to make one specific modification to get the results desired [bold portion].


Edit:

The last -1 is hard coded to adjust rows. It can be replaced by:

(MIN(ROW(m)-1))

So the last formula would look like:

=INDEX(m,SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1,ROW(m)),ROWS($A$1:B1))-(MIN(ROW(m)-1)))

but then it will be generic and can be used directly.
 
Hi Shrivallabha. Nice going, but unfortunately we're after a single formula that you can directly incorporate into other formulas - not a formula that has to be entered over a worksheet range in order to work.


What's more, the INDEX function does not return an array that can be used directly by other formula (such as COUNTA etc) - which is one of the requirements of the challenge as per the above.
 
By way of further example, we could restructure your formula so that the one formula is array-entered over a range:

=INDEX(A:A,SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1,ROW(m)),ROW(OFFSET(B1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1))))))

...and while that one formula correctly returns duplicate values, if you select any of the cells in the array that the formula is entered over and push F9, you only get one value.


This is due to a limitation in the design of INDEX.
 
Note that we could rewrite any of Sajan's formulas for Distinct/Unique/Duplicate values from the last challenge, so that they use INDEX instead of OFFSET, like this:



Code:
=INDEX(A:A,SMALL(IF(COUNTIF(m,m)=1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(COUNTIF(m,m)=1))))))


Code:
=INDEX(A:A,SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1))))))


But again, while all those appear to correctly return an array when array entered over a range, due to the mysterious way that INDEX works, they all evaluate to just one item.


So put COUNTA on them, and you just get 1 back.
 
Last edited:
On this particular challenge I have gone back to square one more than once! But learned a lot along the way.
 
Here is my thought Jeff,


Distinct:


=OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(m,m,0),MATCH(m,m,0)),ROW(m)),ROW(OFFSET(A1,,,SUM(1/COUNTIF(m,m)))))-1,)


Unique:


=OFFSET(A1,SMALL(IF(COUNTIF(m,m)=1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(COUNTIF(m,m)=1)))))-1,)


Duplicate:


=OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(m,m,0),MATCH(m,m,0))>1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1)))))-1,)


Sajan, I borrow some of your tricks :)


All of these will give an array in the formula bar, if we press F9. To work with COUNTA need to derefernce array, with T(), N() or ROWS(T or N(formula))
 
To meet the objective of the challenge - which is one formula that handles an array that contains both numbers and text, you need to dereference in a way that handles both text and numbers. So those formulas don't get us there, i'm afraid.
 
Used TRANSPOSE & INDIRECT


Distinct:


=COUNTA(TRANSPOSE(INDIRECT("A"&SMALL(IF(FREQUENCY(MATCH(m,m,0),MATCH(m,m,0)),ROW(m)),ROW(OFFSET(A1,,,SUM(1/COUNTIF(m,m))))))))


Unique:


=COUNTA(TRANSPOSE(INDIRECT("A"&SMALL(IF(COUNTIF(m,m)=1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(COUNTIF(m,m)=1))))))))


Duplicate:


=COUNTA(TRANSPOSE(INDIRECT("A"&SMALL(IF(FREQUENCY(MATCH(m,m,0),MATCH(m,m,0))>1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1))))))))
 
Assuming data starts in A1, how about:


Dups -

=TRANSPOSE(OFFSET(A1,SMALL(IF(FREQUENCY(IF(COUNTIF(m,m)>1,MATCH(m,m,0)),ROW(m)),ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1)))))-1,))


Distinct - =TRANSPOSE(OFFSET(A1,SMALL(IF(FREQUENCY(IF(COUNTIF(m,m)=1,MATCH(m,m,0)),ROW(m)),ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))=1)))))-1,))


Unique - =TRANSPOSE(OFFSET(A1,SMALL(IF(FREQUENCY(IF(COUNTIF(m,m)>0,MATCH(m,m,0)),ROW(m)),ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>0)))))-1,))
 
Hi gents. My apologies…I didn’t scope this challenge quite clear enough. I was after a formula that works both when entered over a range AND works when referred to directly by another formula such as COUNTA.


An INDEX-based formula meets the first requirement, but not the second. And your formulas meet the second requirement, but not the first.


Also note that Sajan’s previous formulas – which are much shorter – meet that second requirement but not the first too…we just need to drop the surrounding T(). But his formula doesn’t work if we also want to enter the formula over a sheet, because the T ignores any numbers.


So that’s what I was angling towards for this challenge. Apologies for not making that clearer.


Also note that unless I’m missing something, the use of TRANSPOSE doesn’t add anything.


For your formulas Sandeep, I’m not getting the answers I expected.


Here’s what your DUPS formula returns for me:

=OFFSET(A1,SMALL(IF(FREQUENCY(IF(COUNTIF(m,m)>1,MATCH(m,m,)),ROW(m)),ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1)))))-1,)

={1;2;45;35;23}


(Note I removed a superfluous 0 in a MATCH argument in your formulas).


And here’s what your DISTINCT formula returns for me:

=OFFSET(A1,SMALL(IF(FREQUENCY(IF(COUNTIF(m,m)=1,MATCH(m,m,0)),ROW(m)),ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))=1)))))-1,)

={1;49;"jake";49;39;"simon";"simon";2;99}


Here’s what your UNIQUE formula returns for me:

=OFFSET(A1,SMALL(IF(FREQUENCY(IF(COUNTIF(m,m)>0,MATCH(m,m,)),ROW(m)),ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>0)))))-1,)

={1;2;45;49;35;"jake";49;39;23;"simon";"simon";2;99;#NUM!}


Maybe it’s to do with where abouts the list m is in the sheet? On mine, it starts in A2, with a header in A1.
 
Hi Jeff,

As Haseeb points out, one way (the only way?) I can think of is to use a construct such as

IF(T(Formula)="", N(Formula), T(Formula))


Since that results in a long formula with repetitive segments, I am hoping that someone has been able to devise a better approach.


Cheers,

Sajan.
 
I do :) Just thought I'd give you gents a chance to meet me there, so to speak.


Here's my formula lengths:

[pre]
Code:
Distnct	Unique	Dups
118	98	142
[/pre]
Using a trick that I have just discovered, and have not seen documented anywhere. New territory, perhaps.
 
Here's my attempt at Distinct, Unique, Dups respectively


=LOOKUP(SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,)),ROW(m)),ROW(OFFSET(A1,,,SUM(1/COUNTIF(m,m))))),ROW(m),m)


=LOOKUP(SMALL(IF(COUNTIF(m,m)=1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(COUNTIF(m,m)=1))))),ROW(m),m)


=LOOKUP(SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1))))),ROW(m),m)
 
:) looks like bonus is early this time... hehe...


But on a serious note, got to thank you for coming up with these (and the vibrant attitude).... looking forward to more
 
My formula utilised a new method I've discovered to 'dereference' an OFFSET. This I detail right at the bottom of this comment.


But first, some background.


In Challenge 008, the formulas submitted all wrapped a T() around OFFSET. Which is great if they were dealing with just text, but would fail if there were any numbers in the array.


According to a comment by lhm at http://msmvps.com/blogs/nateoliver/archive/2010/02/24/what-does-excel-s-offset-function-return.aspx explains why this is required:


OFFSET operates just like other formulas except it returns a reference not a value. Normally Excel automatically changes the reference to a value in the result, but it won't do these conversions in arrays.

What you need is another function that will help carry out this conversion. TRANSPOSE can do this, however it errors on the worksheet because it isn't expecting a reference argument. But instead you can use N, SUBTOTAL,SUMIF and other functions that are specifically designed for references.


There's another good comment on this by lhm here: http://newtonexcelbach.wordpress.com/2013/01/01/using-excels-shortest-functions/

I find one way to get a handle on understanding this kind of thing is to use the VBE watch window to evaluate the formula. Entering [transpose(offset(A1,{1;5;6},0))] shows that OFFSET is actually returning an array of range references (the transpose function ensures the formula is array-evaluated.) Note that no code needs to be running for this.

To return a valid array of results it seems any function taking OFFSET as an argument must explicitly make provision for range references and not coerce these to arrays (in XLL terms arguments are registered as type R/U not P/Q and allows for types xlTypeRef or xlTypeSRef.)

There appear to be relatively few built-in functions that fit the above criteria. Other than N and T, which both return the upper left cell value when given a range reference as an argument, COUNTIF, SUMIF, SUBTOTAL, RANK, AREAS and Database Functions also allow arrays of references in some arguments – i don’t know if there are any others? Similar behaviour occurs with INDIRECT although the VBE evaluate function won’t work the same in this case.


Interesting. A Google search also shows that one other such function is CELL("Contents",...).

But at http://www.excelforum.com/excel-formulas-and-functions/432774-how-to-reverse-a-range-array.html Harlan Grove says this isn't reliable:

OFFSET only returns Range references. OFFSET called with array 1st, 2nd or 3rd arguments returns something what seems to be an array of range references. Excel can't deal with such beasts when used as arithmetic operands or arguments to most functions. Fortunately, N() is one of the exceptions, and it effectively converts arrays of range references to arrays of numbers (note: it converts entries that aren't numeric into

numeric zero). The T() function does the same for strings. CELL("Contents",.) isn't reliable.



Okay, now for my magic dereferencing trick for OFFSET:

Drum roll please.........it’s.......OFFSET.


That’s right, it turns out that if you wrap an OFFSET in an OFFSET, then you can return both text AND numbers to the worksheet. Don’t ask me why.


So my formulas for Distinct/Unique/Duplicates are these:


Code:
=OFFSET(OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>0,ROW(m)),ROW(OFFSET(A1,,,SUM(1/COUNTIF(m,m)))))-1,0),,)


Code:
=OFFSET(OFFSET(A1,SMALL(IF(COUNTIF(m,m)=1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(COUNTIF(m,m)=1)))))-1,),,)


Code:
=OFFSET(OFFSET(A1,SMALL(IF(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1,ROW(m)),ROW(OFFSET(A1,,,SUM(N(FREQUENCY(MATCH(m,m,),MATCH(m,m,))>1)))))-1,),,)


Here’s some more info on OFFSET’s weird behaviour:

http://www.office-archive.com/2-excel/b0fb46724e4ae2e2.htm

http://groups.google.com/groups?oi=djq&selm=an_562009917

http://dailydoseofexcel.com/archives/2005/05/11/arrays-with-offset/
 
Last edited:
Hi Jeff,

I am not observing the behavior you are describing for OFFSET.


To check it out, try concatenating a value such as:

=OFFSET(OFFSET(...)) & "ABC"


I still get #VALUE errors, even though the portion before the & evaluates correctly in the F9 window.


The LOOKUP(...) technique on the other hand behaves as expected in a similar scenario, concatenating "ABC" correctly.


By the way, there are a few other functions that will work with the OFFSET(...) results:

CONVERT(OFFSET(...), "m", "m") for example behaves like N()


-Sajan.
 
Hi,

Also,

INDEX also is able to reference the results from OFFSET(). However, you can only see it if you array enter the formula over a range.

For example:

=INDEX(OFFSET(...),0,1) array entered over 10 rows will show the 10 values correctly.


Daniel Ferry once mentioned this in one of his postings on the INDEX function. However, he did not elaborate on how this feature could be leveraged.


-Sajan.
 
Sajan...works fine on my system. try the formula I posted above. Note the two last commas before the closing bracket.
 
Sajan: Try this:

Premise: We want to return a non-contiguous range from the array {1;2;3;"a";"b";"c"} that's in A1 to A6.


Enter =OFFSET(A1,{1;3;5},) into B1 to B3. Excel displays #VALUE!


Array enter =N(OFFSET(A1,{1;3;5},)) into C1 to C3. Excel only gives us the numbers e.g. {2;0;0}


Array enter =T(OFFSET(A1,{1;3;5},)) into D1 to D3. Excel only gives us the text e.g. {"";"a";"b";""}


Array enter =OFFSET(OFFSET(A1,{1;3;5},),,). Excel gives us both numbers and text.


With regards to your comment INDEX also is able to reference the results from OFFSET(). However, you can only see it if you array enter the formula over a range.. I didn't know that. Cool.


So i guess the only advantage of my OFFSET(OFFSET combination is that that the one approach can be used regardless of whether you want to return an array to the grid, or whether you want to process it with another formula. But then, if you want only to process it with another formula then the outer OFFSET is redundant.


Also note that =CELL("contents",OFFSET(A1,{1;3;5},)) does the same thing i.e. it dereferences an array returned by OFFSET so that both numbers AND text are returned.


All this is aside from the fact that for this particular challenge, Sam's approach is better.
 
Hi Jeff,

The only difference I notice between the following two formulas is that #1 when array-entered over three cells returns #VALUE while #2 returns {2,"a","c"}.


#1: =OFFSET(A1,{1;3;5},)

#2: =OFFSET(OFFSET(A1,{1;3;5},),,)


While both #1 and #2 can be wrapped in a COUNTA() (returning 3 as the result), they return #VALUE when attempting to process it with another formula.


For example:

#3: =OFFSET(A1,{1;3;5},) & "ABC"

#4: =OFFSET(OFFSET(A1,{1;3;5},),,) & "ABC"


Both return #VALUE;#VALUE;#VALUE


Also, =CELL("contents",OFFSET(A1,{1;3;5},)) behaves similar to #2 above when array-entered over three cells, but when used as part of another formula, it returns only the first value.

#5: =CELL("contents",OFFSET(A1,{1;3;5},)) & "ABC"

returns 2ABC

#6: =COUNTA(CELL("contents",OFFSET(A1,{1;3;5},)))

returns 1, confirming the above result.


The following formulas seem to be equivalent to #2 above:

#2: =OFFSET(OFFSET(A1,{1;3;5},),,)

#7: =INDEX(OFFSET(A1,{1;3;5},), 0,1)

#8: =(OFFSET(A1,{1;3;5},) A:A) (There is a space before A:A, for the intersection operator)

i.e. you will need to array-enter over three cells to see the results {2,"a","c"}


However,

=COUNTA(INDEX(OFFSET(A1,{1;3;5},), 0,1) & "ABC")

returns 1


Similar result of 1 with the following also:

=COUNTA((OFFSET(A1,{1;3;5},) A:A))
 
The only difference I notice between the following two formulas is that #1 when array-entered over three cells returns #VALUE while #2 returns {2,"a","c"}.

Yep, that's what I was trying to get across.


Haven't gone throuh all your examples yet, but one is different: for me, =OFFSET(OFFSET(A1,{1;3;5},),,) & "ABC" does NOT return #VALUE in Excel 2010 or Excel 2013. Maybe it's a regional settings thing?


Gotta go. Wife unimpressed with all this excel stuff.
 
Back
Top