# Formula Challenge 016 - Unique Items in a Delimited String

#### jeffreyweir

##### Active Member
Formula Challenge 016 - Unique Items in a Delimited String

This challenge is lifted straight from a great formula that David Hager posted at http://dailydoseofexcel.com/archives/2013/08/07/calculating-the-number-of-unique-items-in-a-delimited-string

Although I've slightly amended the sample data.

Put this string into A1:

aa,c,d,aa,bbb,c,d,e,f,g,h,I,jjjjjjjj,a

Write one formula that returns a count of how many unique items there are.

David's formula is 299 characters. My first attempt using a different approach is 462 characters.

What can you do it in?

#### shrivallabha

##### Excel Ninja
Hi Jeff,

I think we'll get to see different versions of converting cells content into Array.

Here's my attempt which is looks like cross breed [char length: 314]:

=SUM(N(FREQUENCY(MATCH(TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",99)),(ROW(\$A\$1:INDEX(A:A,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))-1)*99+1,99)),TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",99)),(ROW(\$A\$1:INDEX(A:A,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))-1)*99+1,99)),0),ROW(\$A\$1:INDEX(A:A,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1)))&#62;0))

The good part about splitting data into array and checking it with ROW is that it doesn't belong to cell addresses so we can check with ROWS.

If the second part is also replaced with MATCH formula then it can be used to find out information that Uniques, Dups etc. per other challenges.

There are some threads which may pass as distant relatives of this particular challenge.

http://chandoo.org/forums/topic/find-nearest-value-in-a-list-with-comas

and take a look at this one [you have posted in it ]

http://chandoo.org/forums/topic/counting-unique-items-in-same-cell

And then my formula looks the same as Sajan's

#### sailepaty

##### New Member
Posted in Daily Dose as well.

=SUM(N(MATCH(

TRIM(MID(SUBSTITUTE(A1&REPT("",8^5),",",REPT(" ",999)),ROW(\$1:\$999)*999-998,999)),

TRIM(MID(SUBSTITUTE(A1&REPT("",8^5),",",REPT(" ",999)),ROW(\$1:\$999)*999-998,999)),

0)=ROW(\$1:\$999)))-1

Regards

#### jeffreyweir

##### Active Member

Rick posted this 283 characters one, which is cool:

=SUM(1*ISERROR(FIND(","&TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),ROW(OFFSET(A1,,, LEN(A1)-lEN(SUBSTITUTE(A1,",",""))+1))*999-998,999))&",",","&SUBSTITUTE(TRIM(LEFT

(SUBSTITUTE(TRIM(A1),",",REPT(" ",999)),(ROW(OFFSET(A1,,,LEN(A1)-LEN(SUBSTITUTE (A1 ,",",""))+1))-1)*999))," ",",")&",")))

For readers following along at home, then given data in A1 like this:

aa,c,d,aa,bbb,c,d,e,f,g,h,I,jjjjjjjj,aa

…you first split the string into two arrays:

Array 1:

{",aa,";",c,";",d,";",aa,";",bbb,";",c,";",d,";",e,";",f,";",g,";",h,";",I,";", jjjjjjjj,";",aa,"}

Array 2:

Code:
``````{&#34;,,&#34;;&#34;,aa,&#34;;&#34;,aa,c,&#34;;&#34;,aa,c,d,&#34;;&#34;,aa,c,d,aa,&#34;;&#34;,aa,c,d,aa,bbb,&#34;;&#34;,aa,c,d,aa,bbb,c,&#34; ;&#34;,aa,c,d,aa,bbb,c,d,&#34;;&#34;,aa,c,d,aa,bbb,c,d,e,&#34;;&#34;,aa,c,d,aa,bbb,c,d,e,f,&#34;; &#34;,aa,c,d,aa,bbb,c,d,e,f,g,&#34;;&#34;,aa,c,d,aa,bbb,c,d,e,f,g,h,&#34;; &#34;,aa,c,d,aa,bbb,c,d,e,f,g,h,I,&#34;;&#34;,aa,c,d,aa,bbb,c,d,e,f,g,h,I,jjjjjjjj,&#34;}

…where array2 is just a list of incrementally concatenated elements of the string with a placeholder at the front (which offsets arary2 by one position compared to array1).

That offsetting of array2 this means that for any given element n in array1, array2(n) contains everything in the original string up to that point. Or put another way, array2(n) = CONCATENATE(“,,” , array1(1) , array 1(2) , … , array1(n -1) )

So when we search for array1(n) within array2(n), then if there’s a match, the thing we are looking for has obviously already occurred earlier in the string.

Pure genius.

Here’s how that looks graphically (assuming WordPress doesn’t mangle things):

[pre]Result array1 array2
#VALUE! ,aa, ,,
#VALUE! ,c, ,aa,
#VALUE! ,d, ,aa,c,
1 ,aa, ,aa,c,d,
#VALUE! ,bbb, ,aa,c,d,aa,
4 ,c, ,aa,c,d,aa,bbb,
6 ,d, ,aa,c,d,aa,bbb,c,
#VALUE! ,e, ,aa,c,d,aa,bbb,c,d,
#VALUE! ,f, ,aa,c,d,aa,bbb,c,d,e,
#VALUE! ,g, ,aa,c,d,aa,bbb,c,d,e,f,
#VALUE! ,h, ,aa,c,d,aa,bbb,c,d,e,f,g,
#VALUE! ,I, ,aa,c,d,aa,bbb,c,d,e,f,g,h,
#VALUE! ,jjjjjjjj, ,aa,c,d,aa,bbb,c,d,e,f,g,h,I,
1 ,aa, ,aa,c,d,aa,bbb,c,d,e,f,g,h,I,jjjjjjjj,``````
[/pre]

#### jeffreyweir

##### Active Member
Elias, you can ditch this bit:

&REPT("",8^5)

...which takes your formula down to 163 characters:

=SUM(N(MATCH(

TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)), ROW(\$1:\$999)*999-998,999)),

TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),ROW(\$1:\$999)*999-998,999)),

0)=ROW(\$1:\$999)))-1

#### Haseeb A

##### Active Member
Here is mine, with just ENTER

=SUM(SIGN(FREQUENCY(FIND(SUBSTITUTE(","&TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",99)),ROW(\$1:\$999)*99-98,99))&",",",,",","&LEFT(A1,FIND(",",A1)-1)&","),A1&","),ROW(\$1:\$999))))

#### jeffreyweir

##### Active Member
Haseeb, that is an outstanding use of SIGN. Much better than checking if the array is greater than zero, and then having to coerce the resulting booleans to digits so you can sum them.

Awesome!

#### Stephen Crump

##### New Member
{=SUM(--IFERROR(FIND(S&MID(A1,FIND(S,S&A1,ROW(1:999)),FIND(S,A1&S,ROW(1:999))-FIND(S,S&A1,ROW(1:999)))&S,S&A1&S)=FIND(S,S&A1,ROW(1:999)),0))}

where S is the separator, e.g. ","

#### jeffreyweir

##### Active Member
Stephen, for some reason that's not working for me. It's just returning 1, wheras I'm expecting 10.

#### Stephen Crump

##### New Member
Jeff

It will return 1 if you don't array-enter, otherwise 11?

The last FIND() term actually simplifies to ROW(1:999) in this context, and my latest is:

=COUNT(1/SIGN(FIND(S&MID(A1,FIND(S,S&A1,ROW(1:999)),FIND(S,A1&S,ROW(1:999))-FIND(S,S&A1,ROW(1:999)))&S,S&A1&S)=ROW(1:999)))

#### jeffreyweir

##### Active Member

Great stuff, Stephen. I'm just starting to take it apart and see what makes it tick now.

#### jeffreyweir

##### Active Member
Stephen: Still coming to grips with this masterpiece. What's cool about it is that it can be completely dynamic in terms of string length:

=COUNT(1/SIGN(FIND(","&MID(A1,FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))),FIND(",",A1&",",ROW(OFFSET(A1,,,LEN(A1))))-FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))))&",",","&A1&",")=ROW(OFFSET(A1,,,LEN(A1)))))

Also note that you can make it slightly shorter by removing the bit in bold, and just adding 1 onto the final result:

=COUNT(1/SIGN(FIND(S&MID(A1,FIND(S,S&A1,ROW(1:999)),FIND(S,A1&S,ROW(1:999))-FIND(S,S&A1,ROW(1:999)))&S,S&A1&S)=ROW(1:999)))

=1+ COUNT(1/SIGN(FIND(S&MID(A1,FIND(S,S&A1,ROW(1:999)),FIND(S,A1,ROW(1:999))-FIND(S,S&A1,ROW(1:999)))&S,S&A1&S)=ROW(1:999)))

Or the dynamic version:

=1+COUNT(1/SIGN(FIND(","&MID(A1,FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))),FIND(",",A1,ROW(OFFSET(A1,,,LEN(A1))))-FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))))&",",","&A1&",")=ROW(OFFSET(A1,,,LEN(A1)))))

Edit: also note you don't need the SIGN. So this completely dynamic version will do, and is just 197 characters long:

=1+COUNT(1/(FIND(","&MID(A1,FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))),FIND(",",A1,ROW(OFFSET(A1,,,LEN(A1))))-FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))))&",",","&A1&",")=ROW(OFFSET(A1,,,LEN(A1)))))

#### jeffreyweir

##### Active Member
Lori posted a great approach over at DDOE.

If we combine this with Stephen's approach and Loris, then we get this completely dynamic masterpiece:

=COUNT(1/(FIND(","&MID(A1,FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))),MMULT(FIND(",",{"",","}&A1&",",ROW(OFFSET(A1,,,LEN(A1)))),{1;-1}))&",",","&A1&",")=ROW(OFFSET(A1,,,LEN(A1)))))

Bam!