Quartiles

BigBuilder · Nov 19, 2015

Hi,
I'm struggling to understand the results I'm getting from the Quartile, Quartile.Exc and Quartile.Inc functions.
Basically, whatever function I use on my dataset, quartiles 3 and 4 always return the same value: 1.
The array is percentages and the minimum value is actually zero but, for practical purposes, (i.e., the zeros are as a result of either trapping errors in my VLOOKUp's or a null value in the source dataset), it's around 81% and the maximum value is, naturally, 100%, so there's a small gap in which to squash the quartiles.
I don't have this phenomena when I use the quartiles on other data.
Would be grateful for any ideas why this is happening.

In the meantime, I'll play around with removing the zeroes, null cell values etc and see if that make a difference.

Thanks.

NARAYANK991 · Nov 19, 2015

Hi ,

If after cleaning up your data you still have the problem , can you upload your workbook ?

Narayan

P.S. See if you can use the code given in this link to verify the results of the Excel functions :

http://datapigtechnologies.com/blog...rtiles-from-r-and-other-statistical-packages/

BigBuilder · Nov 19, 2015

Hi,
Thanks for this response and the link which I'd already viewed before posting on this forum in an attempt to find a solution I could 'sell' to my users

.

Will certainly get back.

Thanks again.

BigBuilder · Nov 19, 2015

O.k., taking out all the numbers/spaces/null values etc, unfortunately, made no difference.
Attached is the file.
Any suggestions, or, pointing out my obvious, (and unnoticed), mistakes, welcome.
Thanks.

NARAYANK991 · Nov 19, 2015

Hi ,

I don't have much understanding of the maths behind all this , but from what I could understand :

1. To calculate the quartiles , the data has to be sorted in ascending order.

2. There are only 3 quartiles , since each quartile is a separation between two sets of numbers ; it does not make any sense to talk of a 4th quartile.

3. The second quartile is the median , since there have to be an equal number of values in the lower half of the data set and the upper half of the data set.

If we look at your data , it has 240 values , which means the median is the number which has a value such that 120 values from the data set lie below it , and 120 values from the data set lie above it.

Since the data set has an even number of values , the median has to be a calculated value , which is the average of the last value of the lower 120 values and the first value of the upper 120 values. Thus the median will be the average of the values in E133 and E134.

4. A similar argument can be used to derive the first quartile , since this is the median of the lower set of 120 values , while the third quartile will be the median of the upper set of 120 values.

5. If we do the calculations , the value for the first quartile does not match the value given by the native Excel functions.

However , I have put in the code given in the link I posted , within the Excel file , and I have put in the results of the UDF for various selections of the Hyndman-Fan methods , and the calculated value matches the UDF result for method 5.

6. The third quartile is 1 because of the large number of occurrences of this value ; there are 81 numbers whose value is 100% or 1.

Narayan

BigBuilder · Nov 19, 2015

Thanks very much Narayan.
If you're not sure of the maths then I have no chance !

I've obviously completely misunderstood how quartiles work as I didn't think they were dependent on the number of actual entries, (neither was I aware that the data had to be sorted).

So, I now have a different problem which, I suspect, you've already guessed at.

I want to take the range of percentages in the dataset, from whatever the minimum value is to 100%, divide it equally into 4 chunks, then assign a ranking value, (for want of a better word - not to be confused with the Excel RANK function), to the organisations, based on where their percentages sit within those percentage range 'chunks', (which, I thought, was what the Quartile function did).

Ideas welcome.

Thanks again.

NARAYANK991 · Nov 19, 2015

Hi ,

Since it is late tonight , I will reply tomorrow , though how much I can help is in doubt.

Is the uploaded file data the only data we have to work with ?

Narayan

BigBuilder · Nov 19, 2015

Thanks again, Narayan.

If you need anything else, I'll of course, upload it but the logic generated as a result of this question will be applied to other parts of my data so will be extremely useful.
I wonder if I can use the MID function to slice the values down ?
MID of the whole dataset will give me the mid point of the whole dataset.
Then, use MID again but between the MIN value and the MID figure calculated previously to give me another split.
Similarly apply the above logic to the data between the MAX value and the MID point of the whole dataset.

Anyway, I'll work on that basis and see what I get.

Regards.

BigBuilder · Nov 19, 2015

MID ? MEAN !

r1c1 · Nov 20, 2015

BigBuilder said:
I want to take the range of percentages in the dataset, from whatever the minimum value is to 100%, divide it equally into 4 chunks, then assign a ranking value, (for want of a better word - not to be confused with the Excel RANK function), to the organisations, based on where their percentages sit within those percentage range 'chunks', (which, I thought, was what the Quartile function did).

Ideas welcome.

Thanks again.

You can calculate the range of data (max minus min) using MAX(E14:E253) - MIN(E14:E253).

Once you have the range, calculate 4 chunks.

Using these chunks, you can then find which chunk (or bucket, as these are usually called) an organization belongs to. If you wish, you can calculate relative ranking with in the bucket using rankif formula (thru SUMPRODUCT ofcourse).

See attached.

BigBuilder · Nov 20, 2015

Thanks very much for this solution.

I was thinking along similar lines but only as far as trying to get a list of unique percentages, then using 'something' to divide the unique percentages into 'chunks', ('buckets', thanks for that clarification) but hadn't got much farther.

Btw: it's nothing to do with this thread but I really don't understand the SUMPRODUCT function at all. I've read a lot of threads on it but I still don't 'get it'.
Call me stupid ....
It seems to me like a glorified array formula.

While, of course, I'd welcome the addition of further posts on this with more ideas/background/understanding, I think, between yourself and Narayan, I think I have an excellent solution, so thanks to you both.

Quartiles

BigBuilder

New Member

NARAYANK991

Excel Ninja

BigBuilder

New Member

BigBuilder

New Member

Attachments

NARAYANK991

Excel Ninja

Attachments

BigBuilder

New Member

NARAYANK991

Excel Ninja

BigBuilder

New Member

BigBuilder

New Member

r1c1

Administrator

Attachments

BigBuilder

New Member