• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Ranking string data for one column

JonnyDijksman

New Member
Hi guys, I hope you can help.


I have a worksheet that contains an error log. Each error has with it a number of different pieces of data relevant to the query, one of which being 'cost center'. I would like to create a graph for the 5 most frequently seen cost centers (the cost centers are not unique. Many employees use the same ones). E.g

https://docs.google.com/spreadsheet/ccc?key=0AgOpAwUBu4T_dEFvWmVXYkdLa3NHMnU3ZjZyVVVqWFE#gid=0


In this example (I know the results are not enough to really for a top 5) is roughly how my worksheet looks. I want to extrapolate a top 5 list for cost centers.


Thanks in advance chaps


[edit] I should have pointed out that I didn't want help creating a graph, just how to create a formula for ranking the data in the column :)
 
Hello,

I am assuming that your "top 5" cost centers will be those with the most errors logged.


Perhaps you could add a column to do a count of each cost center, and then use the cost center and the "count" column in a pivot table.


For example, if your range of cost centers is A1:A25, then the following formula in cells B1:B25 will show the number of times a given cost center repeats.

=COUNTIF($A$1:$A$25,A1)


Now, you could use the columns A, B, and any additional columns in a pivot table, to get a grouped result.


Hope that helps.


Cheers,

Sajan.


P.S. Since I am not able to view your worksheet, my suggestions are based on the text in your post.
 
Thanks for the reply Sajan.


I did consider doing it this way, only there are a large number of cost centers, and they are not in any order. Basically A:A - Query number (date) B:B - employee number C:C - Cost Center. The thing to note is that the worksheet is ordered by date and the queries are random in terms of what cost center the employees belong to. So in order to extrapolate the most troublesome cost centers, I need a formula to take the top 5 most frequently seen ones from the list.


One other thing is that the cost centers are the result of a vlookup. I am not sure what, if anything, this affects, but thought it prudent to mention :)


Regards
 
Hi,

I won't be able to look into this for the next few hours. So if you don't get a response soon, I will look into this later this evening.


By the way, the cost centers resulting from vlookup should not be a problem.


thanks,

Sajan.
 
Hi JonnyDijksman,


Here is one approach to finding the top 5 cost centers that have logged the most errors.

I am sure there is a way to optimize the formula, but we can look at optimization next. I wanted to get you something you can start with in the meantime.


Assumptions:

1. Your data range for cost centers is named "costcenter". For my example, I have assumed it to be in the range B2:B19. (i.e. costcenter refers to B2:B19)

2. If a cost center has only one error logged, you don't want it included in the "top 5"


Step 0:

Name the data range for cost centers as "costcenter"


Step 1:

Create a column to count the number of occurrences for each cost center. For example, if cost center is in cells B2:B19, put the following formula in cells C2:C19. (You could of course create the column elsewhere on the worksheet, say in cells P2:p19, since we will be naming this data range also.)

=COUNTIF(CostCenter,B2)

(Remember to replace B2 with the first cell where cost center is located, and copy the formula down to the remaining cells.)


Step 2:

Name this new data range of counts as "rngCount"


Step 3:

Select 5 rows somewhere on your worksheet (i.e. all 5 rows need to be selected together), and while the 5 rows are selected, enter the following formula in the "fx" box (which is where you can edit/create formulas), and end with Ctrl+Shift+Enter (to indicate that it is an array formula):

=MATCH(LARGE(DELTA(DELTA(FREQUENCY(rngcount,ROW($A$1:INDEX($A:$A,COUNT(rngcount))))*(IF(ROW($A$1:INDEX($A:$A,COUNT(rngcount)+1))=1,0,1))))*ROW($A$1:INDEX($A:$A,COUNT(rngcount)+1)), ROW($A$1:INDEX($A:$A,MIN(5,SUMPRODUCT(IFERROR(1/COUNTIFS(rngcount,rngcount,rngcount,">1"), 0)))))), rngcount, 0)


I have assumed that you selected cells G2:G6 to put this formula.


Step 4:

Then in cell H2, put the following formula:

=IFERROR(INDEX(CostCenter,G2,1), "…")


Step 5:

Copy the formula in cell H2 to cell range H2:H6


You should be done!


Cells H2:H6 should now have the list of cost centers with the most errors logged, in descending order. (i.e. H2 would be the "top" cost center ).

The above formula will show "..." if you don't have 5 cost centers to display.


Please test with your data set. I have only tried it with a limited data set of about 18 items.


If this works for you, and you want to know more about the formulas, please let us know, and we can write up an explanation.


Cheers,

Sajan.
 
Hi,

Applying the clarity of thought a new day brings... here is a much simplified approach to finding the top 5 cost centers:


Step 0:

Name the range of values where cost centers are located as "costcenter"


Step 1:

Select 5 rows of cells together, and in the formula box, type in the following formula, and end with Ctrl+Shift+Enter:

=LARGE((FREQUENCY(MATCH(CostCenter,CostCenter,0),MATCH(CostCenter,CostCenter,0))>1) *ROW($A$1:INDEX($A:$A,ROWS(CostCenter)+1)), {1;2;3;4;5})


Step 2:

In the cells next to the cells from Step 1, put in the following formula:

(In this example, I assumed that range from Step 1 is M2:M6)

=IF(M2<>0, INDEX(CostCenter,M2,1), "…")

Copy down to cells N2:N6

Cells N2:N6 now display the top 5 cost centers.


Now you can use cells N2:N6 for further evaluation, such as =COUNTIF(CostCenter,N2) to determine the count of a given cost center in the top 5 list.


Cheers,

Sajan.
 
Hello Jonny,


Here is another one with MODE


F1 enter the Top numbers to display. eg: 5. You can adjust this to dynamically.


E4, with CTRL+SHIFT+ENTER


=LOOKUP(REPT("z",99),CHOOSE({1,2},"",IF(ROWS(E$4:E4)<=F$1,INDEX(costcenter,MODE(IF((costcenter<>"")*ISNA(MATCH(costcenter,E$3:E3,0)),MATCH(costcenter,costcenter,0)))),"")))


Then copy down.


F4, with just ENTER & copy down, to get the count.


=IF(E4="","",COUNTIF(costcenter,E4))


Assuming you are using Excel 2003 or prior. If you are on XL2007 or later you can shorten the formula with IFERROR.


How do you want to deal with TIE?


Hope this helps

Haseeb
 
Welcome back Haseeb! We nearly started a "forum missed Haseeb" post series!!


Regards,

Sajan.
 
Hi Haseeb ,


I think there should be a change to take care of the fact that MODE will return an error if there are no duplicate values.


Can you check this ?


=LOOKUP(REPT("z",99),IF(ROWS(E$4:E4)>F$1,{"";""},INDEX(costcenter,IFERROR(MODE(IF(ISNA(MATCH(costcenter,E$3:E3,0)),MATCH(costcenter,costcenter,0))),MAX(IF(ISNA(MATCH(costcenter,E$3:E3,0)),MATCH(costcenter,costcenter,0)))))))


entered , of course , as an array formula , using CTRL SHIFT ENTER.


Narayan
 
Hi Haseeb ,


Sorry , but the formula I posted earlier is also wrong.


This seems to work :


=LOOKUP(REPT("z",99),IF(ROWS(E$4:E4)>MIN(F$1,SUM(1/COUNTIF(costcenter,costcenter))),{""},INDEX(costcenter,IFERROR(MODE(IF(ISNA(MATCH(costcenter,E$3:E3,0)),MATCH(costcenter,costcenter,0))),MAX(IF(ISNA(MATCH(costcenter,E$3:E3,0)),MATCH(costcenter,costcenter,0)))))))


Narayan
 
Hi Haseeb, Narayan,

I like your technique of ISNA(MATCH(...)) to eliminate the cost centers already found!


Here is another technique to find the top n cost centers. This uses the FREQUENCY function as in my previous formula, but this version sorts the results:


=IF(J2>SUM(1/COUNTIF(CostCenter, CostCenter)), "...", INDEX(CostCenter, RIGHT(LARGE(--(

FREQUENCY(MATCH(CostCenter,CostCenter,0),MATCH(CostCenter,CostCenter,0))) + ROW($A$1:INDEX($A:$A,ROWS(CostCenter)+1))/(10^LEN(ROWS(CostCenter))),

J2),LEN(ROWS(CostCenter))), 1))


entered with Ctrl+Shift+Enter

Cell J2 indicates "nth" value in the Top n list. (i.e. J2=4 would return the 4th largest value.)


I realize that this is longer than the formulas you suggested. However, since it offers another technique, I wanted to share it.


Regards,

Sajan.
 
Narayan,


Adding *{1,1} after MATCH should avoid this.


=LOOKUP(REPT("z",99),CHOOSE({1,2},"",IF(ROWS(E$4:E4)<=F$1,INDEX(costcenter,MODE(IF((costcenter<>"")*ISNA(MATCH(costcenter,E$3:E3,0)),MATCH(costcenter,costcenter,0)*{1,1}))),"")))
 
Hi Haseeb,

What is the purpose of the LOOKUP() and CHOOSE() functions in the formula? Not sure if it is related to my test data, but I get the same results without the LOOKUP and CHOOSE functions. (i.e. Since the MODE function returns a single value, the IF() function also returns a single value.)


In any case, this is a brilliant formula! Thanks for introducing me to some additional calculation techniques.


thanks,

Sajan.
 
Hi Sajan ,


I think the formula is beautiful ! Every piece is necessary for the final result.


Can you test for all of the following cases ?


1. F1 ( or in your formula J2 ) = 0 ( or even a negative number ! )


2. All the entries in costcenter being identical


3. Blanks at the beginning , in between , and at the end of costcenter


4. F1 greater than the number of unique entries in costcenter


Narayan
 
Hi Narayan,

I somehow missed your response until today.


I adapted Haseeb's formula slightly, based on the assumption that I can copy the formula down to additional rows if I need top 3, top 5, top 10, etc. When the data runs out, the formula would return "..." anyway.


=IFERROR(INDEX(list,MODE(IF((list<>"")*ISNA(MATCH(list,$C$1:$C1,0)),MATCH(list,list,0)*{1,1}))), "…")


I tested scenarios #2 and #3 (since #1 and #4 are not applicable in the modified version), and did not see a need for the LOOKUP or CHOOSE functions. Not sure what I am overlooking since you indicate that the LOOKUP and CHOOSE are needed in the formula.


Thanks,

-Sajan.
 
Hello Sajan,


I am sorry, it's been long time. LOOKUP CHOOSE here used as an error handler replacing IFERROR. So can use it <2007 versions.
 
Hi Haseeb,

Thanks for the clarification!


I have seen some of your recent posts on other websites, and figured you were not being challenged enough here! But I am glad to have you back on this site!


-Sajan.
 
Back
Top