• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Formula Challenge 003 - Count unique characters common to two strings

Sajan

Excel Ninja
Challenge Name

-- Calculate the number of unique characters that are common to two strings


Challenge Description

-- This is a formula challenge. As such, only formula submissions are accepted.

-- For two given strings (that may contain any number of characters -- alphabets, numbers, symbols, etc. -- the goal is to determine the number of unique characters that are common to both strings.

-- Each string could be of variable length.

-- Goal is to calculate the count without using any helper cells, names, etc.


Assumptions

-- You may assume a different location for the data range than cell A1 and B1

-- You may not use any helper cells

-- No difference is attributed to upper case and lower case alphabets. i.e. "A" and "a" are considered equivalent.

-- The characters in each string may appear in any order


Sample data and output:

Code:
String1            String2              UniqueCount
Ridiculous    Simple                  3
Incredible    Silly              2
Magic            Fun                  0
Balloon            Children          2
November    December          4
January            August                  2
February    January                  4
United States    India                  4
Australia    United Kingdom        3
Indomitable    submissive          4
Hippopotamus    Elephant          4
Formula            Challenge          2
As an example, the string "Ridiculous" and "Simple" have the letters "s", "i" and "l" in common. As such, the expected output will be 3.


Please verify your formulas against the sample output provided


Enjoy!

-Sajan.
 
Last edited by a moderator:
That is just NASTY!


Code:
=SUM(--(FREQUENCY(IFERROR(SEARCH(MID(A2,ROW(OFFSET(A$1,,,LEN(A2))),1),B2),LEN(B2)+1),ROW(OFFSET(A$1,,,LEN(B2))))>0))-1


Took me two hours!
 
Hi Jeff,

Nice approach!


However, you would want to tweak it a bit... run the formula against "yyy" and "xy" to see why.


-Sajan.
 
Hi Sajan ,


Sorry , but this seemed the easiest way !


http://www.mrexcel.com/archive/Formulas/17498.html


In my defence , let me mention John Walkenbach's rejoinder in this link :


http://datapigtechnologies.com/blog/index.php/the-impenetrable-hiring-test


Narayan
 
Hi Narayan,

Very funny! You are the Google master! :)


In my defence, I did try a few Google searches before settling on the idea for the challenge... but I guess my search skills need work!


In any case, Jeff's formula (with a small tweak) is better than what was posted on the MrExcel site.


-Sajan.
 
Hi Sajan ,


May be , but I find the formula very intuitive , and so easy to understand ; just check which characters from one string occur in the constant array of the alphabets , do the same for the other string ; the FREQUENCY function automatically takes care of duplicates , and array multiplication automatically finds what is in common.


For me , that formula is beautiful , since it does so much , so simply.


Narayan
 
Hi Narayan,

I agree that the approach in Aladin's original formula is elegant. In fact, Jeff's formula follows the same theme. Jeff's formula then extends it by checking for uniqueness. However, unlike the approach described in Mark's formula (also from the same MrExcel link posted above), Jeff identified that there is no need to do a full check against the second string, but a check against the full length of the second string would be sufficient.


I have applied the small tweak to Jeff's CSE formula:

=SUM(--(FREQUENCY(IFERROR(SEARCH(MID(A2,ROW(OFFSET(A$1,,,LEN(A2))),1),B2),FALSE),ROW(OFFSET(A$1,,,LEN(B2))))>0))


To me, that is beautiful!!


-Sajan.
 
Last edited by a moderator:
Hi Sajan. Thanks! Partly inspired by your great post at http://chandoo.org/forums/topic/counting-unique-items-in-same-cell


Will have a think about the tweak needed when I get a chance. Also have another good formula challenge up my sleeve...just got to do some stuff first.


I love this series.


[Edit] - I see you made the tweak already. Sweet. Also proud to announce that I came up with this puppy independently of any Google search, although as per above inspiration from Sajan was essential.
 
Hi Narayan,

I agree that the approach in Aladin's original formula is elegant. In fact, Jeff's formula follows the same theme. Jeff's formula then extends it by checking for uniqueness. However, unlike the approach described in Mark's formula (also from the same MrExcel link posted above), Jeff identified that there is no need to do a full check against the second string, but a check against the full length of the second string would be sufficient.


I have applied the small tweak to Jeff's CSE formula:

=SUM(--(FREQUENCY(IFERROR(SEARCH(MID(A2,ROW(OFFSET(A$1,,,LEN(A2))),1),B2),FALSE),ROW(OFFSET(A$1,,,LEN(B2))))>0))


To me, that is beautiful!!


-Sajan.
 
Hi Sajan,

I am going through the Formula Challenge series and are stuck on this formula.
Can you please explain to me the meaning of "&#62" in your formula
=SUM(--(FREQUENCY(IFERROR(SEARCH(MID(A2,ROW(OFFSET(A$1,,,LEN(A2))),1),B2),FALSE),ROW(OFFSET(A$1,,,LEN(B2))))>0))

Regards

Chris
 
Hi Sajan,

I am going through the Formula Challenge series and are stuck on this formula.
Can you please explain to me the meaning of "&#62" in your formula
=SUM(--(FREQUENCY(IFERROR(SEARCH(MID(A2,ROW(OFFSET(A$1,,,LEN(A2))),1),B2),FALSE),ROW(OFFSET(A$1,,,LEN(B2))))>0))

Regards

Chris
Hi Chris,

Looks like you joined recently. Welcome to Chandoo.org forums.

This forum migrated few months back. During migration, as it happens, some issues cropped up and have not been addressed as of now. You can see a discussion regarding such issues in the lounge.
http://chandoo.org/forum/threads/upgraded-forums-old-vba-codes-formula-handling.12080/

In your case also backticks have caused small problem. Red marked part [do not mistake the ; for argument separator as you can see rest of the arguments have been separated by , (comma) and not ;(semicolon)]

=SUM(--(FREQUENCY(IFERROR(SEARCH(MID(A2,ROW(OFFSET(A$1,,,LEN(A2))),1),B2),FALSE),ROW(OFFSET(A$1,,,LEN(B2))))>0))

becomes

=SUM(--(FREQUENCY(IFERROR(SEARCH(MID(A2,ROW(OFFSET(A$1,,,LEN(A2))),1),B2),FALSE),ROW(OFFSET(A$1,,,LEN(B2))))>0))

Quick key to finding out is using numerical part i.e. in a blank cell put following formula
=CHAR(62)

Hope this helps your way around!
 
Hi Shrivallabha,

Thanks for your quick response, and explanation!
Yes I joined recently and enjoying the Formula Challenge forum.
It is rather difficult to make sense out of the formula's most of the time as you guys are light years ahead of me, but I am trying.

Regards

Chris
 
Solution in PowerQuery - see video here
Super-fast to develop, very flexible (can return common characters as well as count) and less cryptic than cramming it all into a one-line formula.
 
=SUM(IF(FREQUENCY(IFERROR(SEARCH(MID(B1&" ",ROW(INDIRECT("1:"&LEN(B1))),1),A1),0),IFERROR(SEARCH(MID(B1&" ",ROW(INDIRECT("1:"&LEN(B1))),1),A1),0)),1))-1
try this array formula
 
Hello Challengers,

Here is another CSE compatible with all versions.

=COUNT(MATCH(ROW(INDIRECT("1:"&LEN(A1))),SEARCH(MID(B1,ROW(INDIRECT("1:"&LEN(B1))),1),A1),0))

Note LEN: First one is LEN(A1), 2nd one is LEN(B1)
 
Back
Top