• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Formula Challenge 016 - Unique Items in a Delimited String

jeffreyweir

Active Member
Formula Challenge 016 - Unique Items in a Delimited String


This challenge is lifted straight from a great formula that David Hager posted at http://dailydoseofexcel.com/archives/2013/08/07/calculating-the-number-of-unique-items-in-a-delimited-string


Although I've slightly amended the sample data.


Put this string into A1:

aa,c,d,aa,bbb,c,d,e,f,g,h,I,jjjjjjjj,a


Write one formula that returns a count of how many unique items there are.


David's formula is 299 characters. My first attempt using a different approach is 462 characters.


What can you do it in?
 
Hi Jeff,


I think we'll get to see different versions of converting cells content into Array.


Here's my attempt which is looks like cross breed [char length: 314]:

=SUM(N(FREQUENCY(MATCH(TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",99)),(ROW($A$1:INDEX(A:A,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))-1)*99+1,99)),TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",99)),(ROW($A$1:INDEX(A:A,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))-1)*99+1,99)),0),ROW($A$1:INDEX(A:A,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1)))>0))


The good part about splitting data into array and checking it with ROW is that it doesn't belong to cell addresses so we can check with ROWS.


If the second part is also replaced with MATCH formula then it can be used to find out information that Uniques, Dups etc. per other challenges.


There are some threads which may pass as distant relatives of this particular challenge.

http://chandoo.org/forums/topic/find-nearest-value-in-a-list-with-comas


and take a look at this one [you have posted in it ;)]

http://chandoo.org/forums/topic/counting-unique-items-in-same-cell

And then my formula looks the same as Sajan's
 
Posted in Daily Dose as well.


=SUM(N(MATCH(

TRIM(MID(SUBSTITUTE(A1&REPT("",8^5),",",REPT(" ",999)),ROW($1:$999)*999-998,999)),

TRIM(MID(SUBSTITUTE(A1&REPT("",8^5),",",REPT(" ",999)),ROW($1:$999)*999-998,999)),

0)=ROW($1:$999)))-1


Regards
 
Great stuff, lads.


Rick posted this 283 characters one, which is cool:

=SUM(1*ISERROR(FIND(","&TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),ROW(OFFSET(A1,,, LEN(A1)-lEN(SUBSTITUTE(A1,",",""))+1))*999-998,999))&",",","&SUBSTITUTE(TRIM(LEFT

(SUBSTITUTE(TRIM(A1),",",REPT(" ",999)),(ROW(OFFSET(A1,,,LEN(A1)-LEN(SUBSTITUTE (A1 ,",",""))+1))-1)*999))," ",",")&",")))


For readers following along at home, then given data in A1 like this:

aa,c,d,aa,bbb,c,d,e,f,g,h,I,jjjjjjjj,aa


…you first split the string into two arrays:

Array 1:

{",aa,";",c,";",d,";",aa,";",bbb,";",c,";",d,";",e,";",f,";",g,";",h,";",I,";", jjjjjjjj,";",aa,"}


Array 2:

Code:
{",,";",aa,";",aa,c,";",aa,c,d,";",aa,c,d,aa,";",aa,c,d,aa,bbb,";",aa,c,d,aa,bbb,c," ;",aa,c,d,aa,bbb,c,d,";",aa,c,d,aa,bbb,c,d,e,";",aa,c,d,aa,bbb,c,d,e,f,"; ",aa,c,d,aa,bbb,c,d,e,f,g,";",aa,c,d,aa,bbb,c,d,e,f,g,h,"; ",aa,c,d,aa,bbb,c,d,e,f,g,h,I,";",aa,c,d,aa,bbb,c,d,e,f,g,h,I,jjjjjjjj,"}


…where array2 is just a list of incrementally concatenated elements of the string with a placeholder at the front (which offsets arary2 by one position compared to array1).


That offsetting of array2 this means that for any given element n in array1, array2(n) contains everything in the original string up to that point. Or put another way, array2(n) = CONCATENATE(“,,” , array1(1) , array 1(2) , … , array1(n -1) )


So when we search for array1(n) within array2(n), then if there’s a match, the thing we are looking for has obviously already occurred earlier in the string.


Pure genius.

Here’s how that looks graphically (assuming WordPress doesn’t mangle things):

[pre]Result array1 array2
#VALUE! ,aa, ,,
#VALUE! ,c, ,aa,
#VALUE! ,d, ,aa,c,
1 ,aa, ,aa,c,d,
#VALUE! ,bbb, ,aa,c,d,aa,
4 ,c, ,aa,c,d,aa,bbb,
6 ,d, ,aa,c,d,aa,bbb,c,
#VALUE! ,e, ,aa,c,d,aa,bbb,c,d,
#VALUE! ,f, ,aa,c,d,aa,bbb,c,d,e,
#VALUE! ,g, ,aa,c,d,aa,bbb,c,d,e,f,
#VALUE! ,h, ,aa,c,d,aa,bbb,c,d,e,f,g,
#VALUE! ,I, ,aa,c,d,aa,bbb,c,d,e,f,g,h,
#VALUE! ,jjjjjjjj, ,aa,c,d,aa,bbb,c,d,e,f,g,h,I,
1 ,aa, ,aa,c,d,aa,bbb,c,d,e,f,g,h,I,jjjjjjjj,
[/pre]
 
Elias, you can ditch this bit:

&REPT("",8^5)

...which takes your formula down to 163 characters:

=SUM(N(MATCH(

TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)), ROW($1:$999)*999-998,999)),

TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),ROW($1:$999)*999-998,999)),

0)=ROW($1:$999)))-1
 
Here is mine, with just ENTER


=SUM(SIGN(FREQUENCY(FIND(SUBSTITUTE(","&TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",99)),ROW($1:$999)*99-98,99))&",",",,",","&LEFT(A1,FIND(",",A1)-1)&","),A1&","),ROW($1:$999))))
 
Haseeb, that is an outstanding use of SIGN. Much better than checking if the array is greater than zero, and then having to coerce the resulting booleans to digits so you can sum them.


Awesome!
 
{=SUM(--IFERROR(FIND(S&MID(A1,FIND(S,S&A1,ROW(1:999)),FIND(S,A1&S,ROW(1:999))-FIND(S,S&A1,ROW(1:999)))&S,S&A1&S)=FIND(S,S&A1,ROW(1:999)),0))}


where S is the separator, e.g. ","
 
Jeff


It will return 1 if you don't array-enter, otherwise 11?


The last FIND() term actually simplifies to ROW(1:999) in this context, and my latest is:


=COUNT(1/SIGN(FIND(S&MID(A1,FIND(S,S&A1,ROW(1:999)),FIND(S,A1&S,ROW(1:999))-FIND(S,S&A1,ROW(1:999)))&S,S&A1&S)=ROW(1:999)))
 
My bad. Had missed replacing one of the S's with ",".


Great stuff, Stephen. I'm just starting to take it apart and see what makes it tick now.
 
Stephen: Still coming to grips with this masterpiece. What's cool about it is that it can be completely dynamic in terms of string length:

=COUNT(1/SIGN(FIND(","&MID(A1,FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))),FIND(",",A1&",",ROW(OFFSET(A1,,,LEN(A1))))-FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))))&",",","&A1&",")=ROW(OFFSET(A1,,,LEN(A1)))))


Also note that you can make it slightly shorter by removing the bit in bold, and just adding 1 onto the final result:

=COUNT(1/SIGN(FIND(S&MID(A1,FIND(S,S&A1,ROW(1:999)),FIND(S,A1&S,ROW(1:999))-FIND(S,S&A1,ROW(1:999)))&S,S&A1&S)=ROW(1:999)))


=1+ COUNT(1/SIGN(FIND(S&MID(A1,FIND(S,S&A1,ROW(1:999)),FIND(S,A1,ROW(1:999))-FIND(S,S&A1,ROW(1:999)))&S,S&A1&S)=ROW(1:999)))


Or the dynamic version:


=1+COUNT(1/SIGN(FIND(","&MID(A1,FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))),FIND(",",A1,ROW(OFFSET(A1,,,LEN(A1))))-FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))))&",",","&A1&",")=ROW(OFFSET(A1,,,LEN(A1)))))


Edit: also note you don't need the SIGN. So this completely dynamic version will do, and is just 197 characters long:

=1+COUNT(1/(FIND(","&MID(A1,FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))),FIND(",",A1,ROW(OFFSET(A1,,,LEN(A1))))-FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))))&",",","&A1&",")=ROW(OFFSET(A1,,,LEN(A1)))))
 
Lori posted a great approach over at DDOE.


If we combine this with Stephen's approach and Loris, then we get this completely dynamic masterpiece:


=COUNT(1/(FIND(","&MID(A1,FIND(",",","&A1,ROW(OFFSET(A1,,,LEN(A1)))),MMULT(FIND(",",{"",","}&A1&",",ROW(OFFSET(A1,,,LEN(A1)))),{1;-1}))&",",","&A1&",")=ROW(OFFSET(A1,,,LEN(A1)))))


Bam!
 
Back
Top