• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Quartiles

BigBuilder

New Member
Hi,
I'm struggling to understand the results I'm getting from the Quartile, Quartile.Exc and Quartile.Inc functions.
Basically, whatever function I use on my dataset, quartiles 3 and 4 always return the same value: 1.
The array is percentages and the minimum value is actually zero but, for practical purposes, (i.e., the zeros are as a result of either trapping errors in my VLOOKUp's or a null value in the source dataset), it's around 81% and the maximum value is, naturally, 100%, so there's a small gap in which to squash the quartiles.
I don't have this phenomena when I use the quartiles on other data.
Would be grateful for any ideas why this is happening.

In the meantime, I'll play around with removing the zeroes, null cell values etc and see if that make a difference.

Thanks.
 
Hi,
Thanks for this response and the link which I'd already viewed before posting on this forum in an attempt to find a solution I could 'sell' to my users :rolleyes:.

Will certainly get back.

Thanks again.
 
O.k., taking out all the numbers/spaces/null values etc, unfortunately, made no difference.
Attached is the file.
Any suggestions, or, pointing out my obvious, (and unnoticed), mistakes, welcome.
Thanks.
 

Attachments

  • Quartile Example.xlsx
    15.9 KB · Views: 4
Hi ,

I don't have much understanding of the maths behind all this , but from what I could understand :

1. To calculate the quartiles , the data has to be sorted in ascending order.

2. There are only 3 quartiles , since each quartile is a separation between two sets of numbers ; it does not make any sense to talk of a 4th quartile.

3. The second quartile is the median , since there have to be an equal number of values in the lower half of the data set and the upper half of the data set.

If we look at your data , it has 240 values , which means the median is the number which has a value such that 120 values from the data set lie below it , and 120 values from the data set lie above it.

Since the data set has an even number of values , the median has to be a calculated value , which is the average of the last value of the lower 120 values and the first value of the upper 120 values. Thus the median will be the average of the values in E133 and E134.

4. A similar argument can be used to derive the first quartile , since this is the median of the lower set of 120 values , while the third quartile will be the median of the upper set of 120 values.

5. If we do the calculations , the value for the first quartile does not match the value given by the native Excel functions.

However , I have put in the code given in the link I posted , within the Excel file , and I have put in the results of the UDF for various selections of the Hyndman-Fan methods , and the calculated value matches the UDF result for method 5.

6. The third quartile is 1 because of the large number of occurrences of this value ; there are 81 numbers whose value is 100% or 1.

Narayan
 

Attachments

  • Quartile Example.xlsm
    23 KB · Views: 4
Thanks very much Narayan.
If you're not sure of the maths then I have no chance !

I've obviously completely misunderstood how quartiles work as I didn't think they were dependent on the number of actual entries, (neither was I aware that the data had to be sorted).

So, I now have a different problem which, I suspect, you've already guessed at.

I want to take the range of percentages in the dataset, from whatever the minimum value is to 100%, divide it equally into 4 chunks, then assign a ranking value, (for want of a better word - not to be confused with the Excel RANK function), to the organisations, based on where their percentages sit within those percentage range 'chunks', (which, I thought, was what the Quartile function did).

Ideas welcome.

Thanks again.
 
Hi ,

Since it is late tonight , I will reply tomorrow , though how much I can help is in doubt.

Is the uploaded file data the only data we have to work with ?

Narayan
 
Thanks again, Narayan.

If you need anything else, I'll of course, upload it but the logic generated as a result of this question will be applied to other parts of my data so will be extremely useful.
I wonder if I can use the MID function to slice the values down ?
MID of the whole dataset will give me the mid point of the whole dataset.
Then, use MID again but between the MIN value and the MID figure calculated previously to give me another split.
Similarly apply the above logic to the data between the MAX value and the MID point of the whole dataset.

Anyway, I'll work on that basis and see what I get.

Regards.
 
I want to take the range of percentages in the dataset, from whatever the minimum value is to 100%, divide it equally into 4 chunks, then assign a ranking value, (for want of a better word - not to be confused with the Excel RANK function), to the organisations, based on where their percentages sit within those percentage range 'chunks', (which, I thought, was what the Quartile function did).

Ideas welcome.

Thanks again.


You can calculate the range of data (max minus min) using MAX(E14:E253) - MIN(E14:E253).

Once you have the range, calculate 4 chunks.

Using these chunks, you can then find which chunk (or bucket, as these are usually called) an organization belongs to. If you wish, you can calculate relative ranking with in the bucket using rankif formula (thru SUMPRODUCT ofcourse).

See attached.
 

Attachments

  • Quartile Example.xlsx
    27 KB · Views: 10
Thanks very much for this solution.

I was thinking along similar lines but only as far as trying to get a list of unique percentages, then using 'something' to divide the unique percentages into 'chunks', ('buckets', thanks for that clarification) but hadn't got much farther.

Btw: it's nothing to do with this thread but I really don't understand the SUMPRODUCT function at all. I've read a lot of threads on it but I still don't 'get it'.
Call me stupid ....
It seems to me like a glorified array formula.

While, of course, I'd welcome the addition of further posts on this with more ideas/background/understanding, I think, between yourself and Narayan, I think I have an excellent solution, so thanks to you both.
 
Back
Top