• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Excel question with Alphabetic characters

VDS

Member
Dear All,

I have take the list of various authors from the website and put into excel as per attachment and taken total length all data (with and without spacing). But want to find out the following : -

1. How many times a particular alphabetic character say "M" is repeated. (it can be any character as per need). The total number of occurence to be taken with formula.

2. Which character is repeated in maximum nos ? Particular characther to be find.

3. If so, in how much quantity (the result would be the quantity of Answer No.2)

In any case, I think spaces should not come.

Any thoughts /Suggestions ?


VDS
 

Attachments

  • New Microsoft Office Excel Worksheet.xlsx
    14 KB · Views: 5
I thought about doing that Narayan,
but was trying to put it all in to a single formula, which all got too hard
 
For a single-cell solution as to which character is repeated the most number of times, perhaps:

=LOOKUP(1,0/FREQUENCY(0,1/MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B149)))^0,LEN(B2:B149)-LEN(SUBSTITUTE(B2:B149,CHAR(COLUMN(INDEX(1:1,65):INDEX(1:1,90))),"")))),CHAR(COLUMN(INDEX(1:1,65):INDEX(1:1,90))))

Or perhaps, slightly shorter:

=CHAR(64+MATCH(1,0/FREQUENCY(0,1/MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B149)))^0,LEN(B2:B149)-LEN(SUBSTITUTE(B2:B149,CHAR(COLUMN(INDEX(1:1,65):INDEX(1:1,90))),""))))))

Regards
 
For a single-cell solution as to which character is repeated the most number of times, perhaps:

=LOOKUP(1,0/FREQUENCY(0,1/MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B149)))^0,LEN(B2:B149)-LEN(SUBSTITUTE(B2:B149,CHAR(COLUMN(INDEX(1:1,65):INDEX(1:1,90))),"")))),CHAR(COLUMN(INDEX(1:1,65):INDEX(1:1,90))))

Or perhaps, slightly shorter:

=CHAR(64+MATCH(1,0/FREQUENCY(0,1/MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B149)))^0,LEN(B2:B149)-LEN(SUBSTITUTE(B2:B149,CHAR(COLUMN(INDEX(1:1,65):INDEX(1:1,90))),""))))))

Regards

Soooper Genius!
Biting my nails, didn't even 5% i understood :(
how you did that.
 
@Khalid NGO

Thanks! But not that difficult really!

Also, we should probably remove some of the function calls in there by defining e.g. ABC as:

{"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"}

and then the solution becomes:

=CHAR(64+MATCH(1,0/FREQUENCY(0,1/MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B149)))^0,LEN(B2:B149)-LEN(SUBSTITUTE(B2:B149,ABC,""))))))

Regards
 
@Khalid NGO

Thanks! But not that difficult really!

Also, we should probably remove some of the function calls in there by defining e.g. ABC as:

{"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"}

and then the solution becomes:

=CHAR(64+MATCH(1,0/FREQUENCY(0,1/MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B149)))^0,LEN(B2:B149)-LEN(SUBSTITUTE(B2:B149,ABC,""))))))

Regards
Wow this is great...
I tried evaluate some areas and got something, but still confusing the MMULT part.
 
Actually I made a small oversight. Should be:

=CHAR(64+MATCH(1,0/FREQUENCY(0,1/(1+MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B149)))^0,LEN(B2:B149)-LEN(SUBSTITUTE(B2:B149,ABC,"")))))))

A small addition needs to be made to the denominator of the fraction which generates the array being passed to FREQUENCY. Otherwise #DIV/0! errors will potentially affect the result.

@Khalid NGO

I'll post an explanation shortly, if you're interested.

Regards
 
Last edited:
Actually I made a small oversight. Should be:

=CHAR(64+MATCH(1,0/FREQUENCY(0,1/(1+MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B149)))^0,LEN(B2:B149)-LEN(SUBSTITUTE(B2:B149,ABC,"")))))))

A small addition needs to be made to the denominator of the fraction which generates the array being passed to FREQUENCY. Otherwise #DIV/0! errors will potentially affect the result.

@Khalid NGO

I'll post an explanation shortly, if you're interested.

Regards
Hi,

I guess the earlier formula without 1+ worked perfectly when I was testing and after including 1+ it was showing incorrect results!

Am I missing anything?

I tried with B1 cell value AA and B2 cell value B, expected result is A, but surprisingly it is giving C. I also changed the rows(B2:B149) to B2:B3 and accordingly for len()s as well.

Regards,
Prasad DN
 
@prasaddn

Are you sure you've changed everything appropriately?

With B2 and B3 (why do you mention B1 and B2, by the way?) containing "AA" and "B" respectively, this formula:

=CHAR(64+MATCH(1,0/FREQUENCY(0,1/(1+MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B3)))^0,LEN(B2:B3)-LEN(SUBSTITUTE(B2:B3,ABC,"")))))))

returns "A" for me.

It was by pure fortune that you were getting correct results with the previous version of the formula!

Regards
 
@prasaddn

Are you sure you've changed everything appropriately?

With B2 and B3 (why do you mention B1 and B2, by the way?) containing "AA" and "B" respectively, this formula:

=CHAR(64+MATCH(1,0/FREQUENCY(0,1/(1+MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B3)))^0,LEN(B2:B3)-LEN(SUBSTITUTE(B2:B3,ABC,"")))))))

returns "A" for me.

It was by pure fortune that you were getting correct results with the previous version of the formula!

Regards
Hi,

My mistake.... :( I did not notice the braces ( ) . It is indeed giving correct result "A" now. I am really sorry for my above post.

BTW, yes it was showing correct results without 1+ as well.

Regards,
Prasad DN
 
Yes Sir surely,

Ok, so using the original data as posted by the OP and the formula from post #10, and amending the upper range reference from 149 to 5 for the sake of simplicity, i.e. using:

=CHAR(64+MATCH(1,0/FREQUENCY(0,1/(1+MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B5)))^0,LEN(B2:B5)-LEN(SUBSTITUTE(B2:B5,ABC,"")))))))

we see that this part:

LEN(B2:B5)-LEN(SUBSTITUTE(B2:B5,ABC,""))

will resolve to a 4-row-by-26-column array, the 26 values in each row of which representing the number of occurrences of each letter of the alphabet - from A to Z - within each of the 4 strings (in B2:B5) being queried, i.e.:

{2,0,0,1,0,0,0,2,0,0,0,0,1,0,1,0,0,1,1,1,0,0,0,0,1,0;4,1,0,1,1,0,1,1,1,0,0,0,0,2,1,0,0,3,0,2,0,0,0,0,0,0;2,0,0,1,3,0,0,1,1,0,0,2,0,3,1,0,0,1,2,1,0,0,0,1,1,1;3,0,0,0,3,0,0,1,2,0,1,2,1,0,0,1,0,1,2,0,0,0,1,0,0,0}

I have highlighted a couple of values to serve as examples. The 4 here - the first entry in the second row of this matrix - represents the fact that "A" occurs that number of times within the 2nd string being queried (from B3), i.e. "RABINDRA NATH TAGORE".

And the 3 I've highlighted - the fifth entry in the third row of this matrix - represents the fact that "E" occurs that number of times within the 3rd string being queried (from B4), i.e. "ALEXANDER SOLZHENITSYN".

(Note that it was important to define ABC here as being orthogonal to the range being searched (B2:B5), i.e. as:

{"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"}

and not as:

{"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";"P";"Q";"R";"S";"T";"U";"V";"W";"X";"Y";"Z"}

since we would not then be able to generate our required array.)

Clearly we are interested in totalling the number of occurrences of each letter within each of the 4 strings being queried. And this amounts to totalling the values in each of the 26 columns in the above matrix: the sum of the entries in the first column (2, 4, 2 and 3), for example, giving us the total count for the letter "A" for those 4 strings.

We do this using MMULT. Since the above is a 4-row-by-26-column matrix, then, in order to get the totals for each of the 26 columns we must form the product of this matrix and one which consists of the same number of columns as there are rows in that matrix, i.e. 4. And, so that the resulting arithmetic is correct, each of the entries within this second matrix must be unity, i.e. we need to form:

{1,1,1,1}

One way to achieve this would be to use:

TRANSPOSE(ROW(B2:B5)^0)

since this resolves to:

TRANSPOSE({2;3;4;5}^0)

i.e.:

TRANSPOSE({1;1;1;1})

i.e.:

{1,1,1,1}

as required.

However, I personally prefer to avoid this construction. Although it is brief, the use of TRANSPOSE will mean that the formula requires CSE. More importantly, the use of ROW is not very rigorous and may be adversely affected by any row insertions within the range.

Hence, I preferred here the longer:

COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B5)))^0

which, by using COLUMN, rather than ROW, means that no explicit transposition is required. The use of INDEX also means that we avoid volatile INDIRECT/OFFSET constructions. (The fact that using COLUMN means that this solution is restricted to ranges of no more than 16,384 entries should not, in practice, be a concern.)

The above then resolves to:

COLUMN($A$1:$D$1)^0

i.e.:

{1,1,1,1}

as required.

We now have our two arrays to pass to MMULT, such that:

MMULT(COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(B2:B5)))^0,LEN(B2:B5)-LEN(SUBSTITUTE(B2:B5,ABC,"")))

which is:

MMULT({1,1,1,1},{2,0,0,1,0,0,0,2,0,0,0,0,1,0,1,0,0,1,1,1,0,0,0,0,1,0;4,1,0,1,1,0,1,1,1,0,0,0,0,2,1,0,0,3,0,2,0,0,0,0,0,0;2,0,0,1,3,0,0,1,1,0,0,2,0,3,1,0,0,1,2,1,0,0,0,1,1,1;3,0,0,0,3,0,0,1,2,0,1,2,1,0,0,1,0,1,2,0,0,0,1,0,0,0})

resolves to:

{11,1,0,3,7,0,1,5,4,0,1,4,2,5,3,1,0,6,5,4,0,0,1,1,2,1}

in which array the 11 - representing the total number of occurrences of the letter "A" within our 4 strings - is clearly the maximum.

The remainder of the formula involves a construction used to locate this maximum. I will not go into details here, though if you're interested there is an explanation of this technique here:

http://excelxor.com/2015/02/22/return-entry-corresponding-to-maximum-value-based-on-conditions/

Regards
 
@XOR LX
This is amazing, I think creating such formula is an art, and you are the Artist.
Thanks for taking the time to explain in detail. I appreciate your response.
Take care.


Now I am taking some time to digest.
 
Ok, so using ...
if you're interested there is an explanation of this technique here:

http://excelxor.com/2015/02/22/return-entry-corresponding-to-maximum-value-based-on-conditions/

Regards

Hi Again,

I did go thru the link above, and first thing first, awesome site for learning advanced formulas!! I will be a regular visitor of your site from now.

Coming back to this thread, I failed to understand the Frequency(), i am not getting what is this function will return and what those two parameters are.

Well, I will go thru your site from start Aug 2014 posts and slowly catch up. :)

Regards,
Prasad DN
PS: once again, you got a brilliant site !!
 
@prasaddn

Did you follow the explanation as to the FREQUENCY part in that post? Which bit did you not understand?

Thanks again for your kind words!

Regards
 
Dear All,

My goodness. What an overwhelming response and brilliant solutions. In fact I put this thread based on a small doubt and now solution is getting into leaps and bounds. It is unbelievable.

Like it or not, I also generated the result in my own way as per attachment which is working perfect. Here, the columns containing Index+Match function and Max to be removed without any macro.

Once again thanks to all for your wonderful answers.


VDS
 

Attachments

  • Data11.xlsx
    21 KB · Views: 0
@prasaddn

Did you follow the explanation as to the FREQUENCY part in that post? Which bit did you not understand?

Thanks again for your kind words!

Regards
I was not able to understand the meaning and function of FREQ and did not make any sense of what is data and bins. I did some search and now it is very clear about what this function does, and returns.

I referred the site:
http://exceluser.com/formulas/frequency-distribution-five-ways.htm

Regards
Prasad DN.
PS: Thank you for asking XOR LR :)
 
@prasaddn

Yes - FREQUENCY is not at all an easy function to understand, and certainly at first. Sounds like you've found a good tutorial there, though, so that's a good start!

Regards
 
Back
Top