• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extracting a sorted,unique list,ranked by frequency of occurence(broken formula)

bruno

New Member
Hello,


My name is Bruno and it's my first time here.


I need a formula that will extract a sorted, unique list of strings, ranked by number of ocorrences (most ocurring items first). Example:


Original list (single column, unknown number os rows, with some blank cells):

CC

AA

(blank cell)

BB

(blank cell)

AA

CC

AA

(blank cell)

(blank cell)

(blank cell)


Resulting list would be (unique results, most frequent items first, no blank rows):

AA

CC

BB


The resulting list should update automatically when there are modifications in the original list, so using data pilot or filters is no good.


The following array formula is what I've got so far. It's entered on B2 and copied down:


=INDEX(lista;MODE(IF(ISNA(MATCH(list;B$1:B1;0));MATCH(list;list;0)*{1.1})))


It's a simplified (stripped down) version of Haseeb's formula, from the following post:

http://chandoo.org/forums/topic/ranking-string-data-for-one-column

Also mentioned in the following article:

http://chandoo.org/wp/2012/10/04/formula-forensics-no-030/


Here is the problem: The formula works perfectly IF there are no blank cells in the original list (interval). A single blank cell turns all of the result cells into "#N/A". I've been trying to get it to work with blank cells (ignoring them) for several days, with no success. I use a lot of spreadsheets at work, but never used an array formula before.


Thanks in advance,

Bruno
 
Hi bruno,


This solution uses a helper column.


Your data is in A1:A11 like this:

[pre]
Code:
CC
AA

BB

AA
CC
AA

In B1 enter this and drag down to B11:

[pre][code]=IF(ISBLANK(A1)=TRUE,FALSE,IF(COUNTIF($A$1:A1,A1)=COUNTIF($A$1:$A$11,A1),COUNTIF($A$1:A1,A1),FALSE))
[/pre]
In D1 enter this press ctrl+shift+enter and drag down,[/code][/pre]
=IFERROR(INDEX($A$1:$A$11,LARGE(IF(ISNUMBER($B$1:$B$11)=TRUE,ROW($B$1:$B$11)),ROW(A1)),0),"")

`

In E1 enter this press ctrl+shift+enter and drag down,


=IFERROR(INDEX($B$1:$B$11,LARGE(IF(ISNUMBER($B$1:$B$11)=TRUE,ROW($B$1:$B$11)),ROW(A1)),0),"")


Hope that helps.


Faseeh
 
Hi Bruno ,


if your list is from A2 through A12 , then you can enter the following formula in C2 , and copy down ; remember to enter it as an array formula , using CTRL SHIFT ENTER :


=INDEX(List, MATCH(MIN(MODE(IF((List<>"")*NOT(COUNTIF($C$1:$C1, List)), (COUNTIF(List, "<"&List)+1)*{1,1}))), IF((List<>""),COUNTIF(List, "<"&List)+{1}), 0))


Narayan
 
Faseeh: Hi. Your formula resulted in a list of unique items, however thay are not ranked by number of ocurrences (most ocurring items first). Also, I'm trying to avoid using helper columns (unless there is no other choice). Is there a way of avoinding the helper column and ranking the result? Thanks for your help.


NARAYANK991: Hi. Thanks for your reply. However, the formula didn't work here (sorry). The result is "#N/A", when I drag down the formula (or copy down), "#N/A" appears in all rows. I'm using Excel 2010 and entered the formula with Ctrl+shift+enter.


Best regards,

Bruno
 
Hi Bruno,


I think there is some mus-understanding on your side, the formula i reported gives this when used:

[pre]
Code:
AA	3
CC	2
BB	1
[/pre]

However it doesn't meet your criteria of "no helper column".


Regards,
 
NARAYANK991: First I have inserted your formula exatcly as you posted (copy and paste). Excel says "The formula you typed contains an error". Then I replaced your argument separators (commas) for semicolons and the error dissapeared, however the result was #N/A. I found that Excel uses different separator depending on Country. In the US it's comma, in Europe and other countries (also Brazil), it's semicolon. Some sites state that changing the "list separator" in Windows "International Settings" will also change the separator used by Excel, but that didn't work here. Excel still doesn't accept comma as separator.

Could you please send a file with the working formula to my e-mail (brunojfs1 at gmail dot com)? Maybe my excel will open it and make the proper adjustments for my locale. I appreciate your help very much.


Faseeh: Yes, I probably did something wrong here... Maybe the problem is related to the argument separator (as explained to NARAYANK991 above)? Thank you very much for your help.
 
Update:


NARAYANK991: After changing the locale of the whole system to American English, I was able to change Excel's separator to comma. However, Excel still says "The formula you typed contains an error" and highlights the *{1,1} part of the formula. I removed the *{1,1} part and the error no longer exists. After removing that part, the formula works partially: It shows the most ocurring items in the list, however it does not show the items that only occur once in the original list. I wonder why Excel sees *{1,1} as an error.


Faseeh: I re-tested your formula with Excel configured to use commas as argument separators and got an unusual result: when tested with the example list from my first post, it worked exactly as you said. However, if we change the original list, the result is no longer ranked correctly as in the following example:


Original list:

CC

AA


BB


AA

CC

AA


DD

EE


Result:

EE 1

DD 1

AA 3

CC 2

BB 1


Thank you both!
 
Hi Bruno ,


The {1,1} signifies an array of 1 row and 2 columns ; an array of 2 rows and 1 column would be entered as {1;1}. This is as far as array usage is concerned.


When you did a global find and replace of all commas by semi-colons , has the above comma within the array brackets also been replaced by a semi-colon ?


You need to find out in your locale settings whether this separator in array elements is the same or different.


The above array is required , because of the way the MODE function works ; it returns the right results of frequencies only when there are duplicates in the input data ! If an element occurs only once , the MODE function does not return it ; the above construct of {1,1} basically doubles the number of elements in the input list so that the MODE function works correctly ! Even though the number of elements is doubled , the relative frequencies will not change , which is why the final order will be correct.


Narayan
 
Hi NARAYANK991,


Thanks for explaining the {1,1} part of the formula. I have learned a lot, since this is my first time working with array formulas. T


The sample file, it works exactly as needed! Today I was able to complete the spreadsheet I have been struggling with for the last several days. I appreciate your help very much.


Faseeh: Thanks for your help too.


If there's anything I can do for you both, please let me know.


Best regards,

Bruno
 
Back
Top