Hi Ufoo ,
I wish I had Lori's brilliance to come up with such a formula in the first place.
I have attached a file with some data so that you can use it to follow along with this explanation.
The principle is brilliant in its simplicity.
The LARGE function treats both uni-dimensional ranges / arrays and multiple row multiple column ranges alike ; knowing this is key to understanding this formula.
In essence , if we can convert the two dimensional range , with multiple rows and multiple columns , into a uni-dimensional range consisting of only one column , then the task can be redefined as follows :
Given a single column range , can we identify which indexes we should look at , so that we get the smallest number in each row ?
To start with , suppose we have a single column range , say A1:A20 ; to find out the smallest value in each row is obvious ; we just take each value itself !
Suppose now we extend the range to two columns , A1:B20. To find out the smallest value in each row , if we convert the two column range to a single column range with double the original number of rows , is not so easy. Lori's formula tells us that we need to take the following values from the 40 element range :
{9;25;25;36;5;27;23;28;34;18;23;38;28;40;36;38;32;34;18;32}
This principle remains the same even when we extend the problem to more than 2 columns.
So how does this work ?
To explain this visually , I have inserted the RANK function in columns J through V ; the RANK function also is as generous as the LARGE function , treating uni-dimensional and two dimensional ranges alike.
In columns L and M , you can see that the values given in the array above have been conditionally formatted ; this makes it clear that what the LARGE function is returning are the maximum ranks in each row.
Don't be misled !
The maximum value given by the RANK function , with the third parameter omitted , corresponds to the smallest value in the range !
So , in effect , we are able to extract the smallest values in the original range , by looking at the largest rank values in the array formed by the RANK function.
But the problem is , it is not enough to look at the largest rank values by themselves ! We need to look at the largest rank values in each row. Hence the use of the ROW function.
If we merely look at the ROW function , and combine it with the RANK function , the two values happen to be in the same region , at least in the data set which I have inserted in the uploaded workbook ; because of this , combining the two will invariably change the overall value to an extent where we will not be able to retrieve the RANK value at a later stage.
Hence the multiplication by 10^6 ; this value of 10^6 has been used because using a multiplier of 10^6 appears to be safe given that the numbers in the original data set are small in value. If the numbers in the data set also happen to be of this magnitude , this number will have to be increased.
Let us take a more detailed look at how this works.
The RANK function , when it is applied to the entire range A1:D20 , returns the following matrix of values ; for the sake of brevity , I am posting the first 3 rows from the matrix :
{23,59,19,49;1,36,24,1;10,62,70,1}
Now , let us combine this with the ROW function as it has been used , as follows :
{1000023,1000059,1000019,1000049;2000001,2000036,2000024,2000001;3000010,3000062,3000070,3000001}
Using the ROW function has automatically segregated the values of each row , in their own slabs i.e. the first four values are in the slab :
1000000 to 1000099 , assuming that the original data set has values between 0 and 99
The next 4 values in the array are in the slab :
2000000 to 2000099
and the next 4 values in the array are in the slab :
3000000 to 3000099
and so on for the remaining rows in the data range.
If we now use the LARGE function on these combined values , it is clear that the 4 values in the last row ( row 20 ) , will be at the top of the resulting array ; these will be followed by the 4 values in row 19 , then the 4 values in row 18 , and so on , till the last 4 elements will be the 4 values in row 1.
But remember , all these values will be ranked in descending order.
Since we have 4 elements in each row , all we have to do is look at every fourth element , starting with the first !
Thus , if we look at the following elements :
1 , 5 , 9 , 13 , 17 ,...., 73 , 77
we will be able to extract the largest rank values in each row ; obviously , these will correspond to the smallest data values in each row !
Thus , all that is remaining is to derive an array of values as follows :
{1;5;9;13;17;21;25;29;33;37;41;45;49;53;57;61;65;69;73;77}
This is exactly what is being done by the following portion of the formula :
(ROW(A1:D20)-MIN(ROW(A1:D20)))*COLUMNS(A1:D20)+1
The portion highlighted in blue returns an array of values from 0 through 19.
Multiplying this by the number of columns , 4 in this case , gives an array of values :
{0;4;8;12;16;20;24;28;32;36;40;44;48;52;56;60;64;68;72;76}
Adding 1 to this results in the array that we wanted.
Thus , to summarize everything we have detailed so far :
1. Use the RANK function to return a matrix of values , ranking the original data in an order where a rank of 1 is assigned to the largest value in the original data set , and 80 is assigned to the smallest value in the original data set.
2. Combine this with the ROW function , multiplied by a large enough number that the RANK values do not change even after the combination. Using the ROW function automatically ensures that the resulting values are grouped row-wise , making it easier to extract the row-wise maximum values.
3. Extract the row-wise maximum values by looking at every nth value in the array , starting from the first , where n is the number of columns in the original data set.
What we now have is an array as follows :
{20000079;19000045;18000052;17000065;16000072;15000072;14000052;13000065;12000072;11000065;10000049;9000072;8000065;7000079;6000072;5000072;4000070;3000070;2000036;1000059}
The values of interest are only the values :
{79;45;52;65;72;72;52;65;72;65;49;72;65;79;72;72;70;70;36;59}
What do the above values represent ?
The value 79 represents the largest value in the array of ranks , whose complete list is available in columns S , T , U and V.
We do not have the value 80 because there are two ranks of 79 , since there are two instances of the value 1 , the smallest value in the original data set. Because the RANK function starts from a rank of 1 ( for the largest data value ) , as it goes down the data set , assigning higher rank values , when it comes to duplicates , it assigns the same rank to all duplicate instances of the same data value , and then skips ranks when assigning the next rank to the next smaller value in the data set.
Extracting the above rank values is simple and uses the MOD function , with the same value which was used earlier as the multiplier ; this magic is akin to the old Arabian puzzle , which involved dividing 19 camels in the ratios 1/2 , 1/4 and 1/5 ! All that needed to be done was to add a camel to make the total 20 , and then , after the division was done , take it back !
All that is left is to use this final array of ranks as the second parameter to the LARGE function , with the original data set as the first parameter ; thus , using :
=LARGE(A1:D20 , {79;45;52;65;72;72;52;65;72;65;49;72;65;79;72;72;70;70;36;59})
will return the values :
{1;11;8;4;2;2;8;4;2;4;10;2;4;1;2;2;3;3;13;7}
which are the minimum values in each row of data , starting with 1 as the smallest value in row 20 , 11 as the smallest value in row 19 , and so on , till we come to 7 as the smallest value in the first row.
Any doubts ?
Narayan